Zhengdeng Lei, PhD

Zhengdeng Lei, PhD

2009 - Present Research Fellow at Duke-NUS, Singapore
2007 - 2009 High Throughput Computational Analyst, Memorial Sloan-Kettering Cancer Center, New York
2003 - 2007 PhD, Bioinformatics, University of Illinois at Chicago

Tuesday, December 13, 2011

http://www.youtube.com/watch?v=VwZWa76vKFE

http://www.youtube.com/watch?v=VwZWa76vKFE

Monday, November 28, 2011

Ranking

http://ontario.compareschoolrankings.org/elementary/SchoolsByAreaMap.aspx
http://ontario.compareschoolrankings.org/secondary/SchoolsByAreaMap.aspx

Saturday, November 26, 2011

http://www.imuc.com/pdf/Griffin-Industry-Report-09-14-2009.pdf

http://www.imuc.com/pdf/Griffin-Industry-Report-09-14-2009.pdf

Brain Cancer Stem Cells Display Preferential Sensitivity to Akt Inhibition
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2739007/


Breast CS CD44+/CD24-/Lin-
Prospective identification of tumorigenic breast cancer cells

Friday, November 25, 2011

BEZ235 vs CD44+


Combination Therapy Targeting Both Tumor-Initiating and Differentiated Cell Populations in Prostate Carcinoma


Results: Here, we show that inhibition of PI3K activity by the dual PI3K/mTOR inhibitor NVP-BEZ235 leads to a decrease in the population of CD133+/CD44+ prostate cancer progenitor cells in vivo. Moreover, the combination of the PI3K/mTOR modulator NVP-BEZ235, which eliminates prostate cancer progenitor populations, and the chemotherapeutic drug Taxotere, which targets the bulk tumor, is significantly more effective in eradicating tumors in a prostate cancer xenograft model than monotherapy.

Identification of Selective Inhibitors of Cancer Stem Cells by High-Throughput Screening

http://www.cell.com/retrieve/pii/S0092867409007818

Sunday, November 20, 2011

cancer stem cells (CSCs) or tumor-initiating cells (TICs)

http://www.miltenyibiotec.com/en/NN_722_Tumor_stem_cells.aspx

Within a tumor, the majority of tumor cells have limited ability to proliferate and rather differentiate into cells that constitute the bulk of the tumor mass. Recent theories suggest that a small population of cells within some tumors possess the ability to self-renew and proliferate and are thus able to maintain the tumor. These cells, which are called cancer stem cells (CSCs) or tumor-initiating cells (TICs), have been observed to share certain characteristics with normal stem cells, including a stem cell–like phenotype and function.

Certain surface markers that are associated with stem cells are also found on cancer stem cells. Human and mouse stem cell markers such as CD34CD133,CD117Sca-1, and other markers, such as CD44, CD24, CD20, CD105, andCD326 (EpCAM) have been found on cancer stem cells. This particular type of cell seems to be able to initiate and drive tumor growth in different hematological and solid tumors. It is critical to be able to identify and isolate these cells from tumor tissues in order to provide a clearer picture of the mechanisms governing the establishment of CSCs, their maintenance, and the molecular alteration in comparison to normal cells. An enrichment of CSCs has been observed in cell populations selected for CD133 expression from brain tumor1–4, prostate cancer5, renal tumors6, and also recently from colon cancer7,8 and hepatocellular carcinoma9.

To view the respective citations and a list of associated products, please download the attached PDF file.

Monday, November 14, 2011

PCA

#Here the object data is the gene expression from RMA with dimension pxn = 54675 x 248, here n=248 (two batches: 192+56)
genes<-data[1:54613,]
genes<-t(genes)
pcs<-prcomp(genes)
summary(pcs)

library(scatterplot3d)
PC1<-pcs$x[,1]
PC2<-pcs$x[,2]
PC3<-pcs$x[,3]

group.colors <- rep("#000000", length(PC1))
group.colors[seq(1,192,1)] = "#FF77FF"   #SG Batch A
group.colors[seq(193,248,1)] = "blue"      #SG Batch B


scatterplot3d(PC3,PC1,PC2, main="PCA scatterplot before ComBat normalization", color=group.colors, pch=16)
legend.txt <- c("SG Batch A", "SG Batch B")
legend.col <- c("#00FF00","#0000FF")
legend(-5.75, 5.2, legend.txt , bty="n", col=legend.col,cex=1.2, pch=15)

Monday, November 7, 2011

Cancer Guide

http://www.cancerguide.org/pathology.html

Wednesday, November 2, 2011

Boxplot



GP130 <- c(0.016, 0.023, 0.028, 0.030, 0.037, 0.040, 0.045, 0.051, 0.055, 0.070, 0.071, 0.075, 0.075, 0.076, 0.088, 0.092, 0.094, 0.095, 0.096, 0.097, 0.105, 0.114, 0.123, 0.135, 0.146, 0.163, 0.165, 0.176, 0.187, 0.199, 0.200, 0.214, 0.215, 0.249, 0.256, 0.263, 0.273, 0.302, 0.317, 0.330, 0.361, 0.364, 0.380, 0.386, 0.390, 0.392, 0.393, 0.449, 0.480, 0.494, 0.500, 0.514, 0.536, 0.541, 0.545, 0.562, 0.587, 0.647, 0.652, 0.662, 0.685, 0.696, 0.709, 0.718, 0.719, 0.741, 0.747, 0.799, 0.805, 0.805, 0.821, 0.870, 0.901, 0.988, 0.994, 0.008, 0.017, 0.017, 0.028, 0.030, 0.039, 0.045, 0.067, 0.085, 0.204, 0.223, 0.264, 0.276, 0.289, 0.290, 0.303, 0.314, 0.320, 0.346, 0.363, 0.365, 0.372, 0.380, 0.386, 0.389, 0.391, 0.408, 0.441, 0.446, 0.449, 0.460, 0.471, 0.472, 0.485, 0.492, 0.497, 0.498, 0.500, 0.501, 0.502, 0.508, 0.513, 0.527, 0.528, 0.529, 0.543, 0.585, 0.604, 0.612, 0.621, 0.628, 0.651, 0.652, 0.672, 0.688, 0.688, 0.702, 0.704, 0.704, 0.709, 0.716, 0.723, 0.727, 0.730, 0.736, 0.743, 0.749, 0.751, 0.758, 0.762, 0.764, 0.766, 0.772, 0.779, 0.780, 0.783, 0.784, 0.787, 0.800, 0.803, 0.814, 0.819, 0.824, 0.826, 0.831, 0.836, 0.837, 0.840, 0.843, 0.860, 0.903, 0.931, 0.948, 0.960, 0.968, 0.974, 0.983, 0.991, 0.999)



Lauren <- c("Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Diffuse", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal", "Intestinal")



p <- t.test(GP130~Lauren)$p.value



#p <- lapply(p, signif, 3)

p <- signif(p, 3)



 boxplot( as.numeric(GP130) ~ Lauren, col=c("#FFC000", "#0070C0"),  medlwd = 2, lwd = 2, outlwd=1, cex.axis=1.2, ylab="GP130 activation score")



legend(2.05,0.2, paste("p = ", p, sep=""), box.col ="white")



setwd('E:/Projects/GP130')



pdf(file='SG192.boxplot.pdf')



 boxplot( as.numeric(GP130) ~ Lauren, col=c("#FFC000", "#0070C0"),  medlwd = 2, lwd = 2, outlwd=1, cex.axis=1.2, ylab="GP130 activation score")



legend(2.05,0.2, paste("p = ", p, sep=""), box.col ="white")



dev.off()

Monday, October 31, 2011

GSM2Sample.pl

use lib 'E:/perl_lib';
use FileSystem;

my $work_dir = 'E:\CEL\GSE15460\GSE15460_RAW\GSE15460.SG.CEL';
my $output_file = 'E:\CEL\GSE15460\GSE15460_RAW\GSE15460.info.txt';

my @CellFiles = ();
@CellFiles = FileSystem::GetFileByPattern($work_dir, '\.CEL', @CellFiles);
foreach $cel (@CellFiles)
{
print "$cel\n";
my @CELContent = FileSystem::ReadFile($cel);
my $sample_info_line = $CELContent[13];
my @sample_info = split(/[\s:]/, $sample_info_line);
print "$sample_info[2]";

FileSystem::WriteFile($output_file, "$cel\t$sample_info[2]\n");


}

FileSystem::Close();


STANDARD TRID Tumors on mRNA (Affy U133P2) Reason for exclusion GEO NGCII011/LGE GC-011LGE-T.CEL Fail QC GSM387788.CEL NGCII035/PCC GC-035PCC-T.CEL Fail QC GSM387797.CEL NGCII038/LYC GC-038LYC-T.CEL Fail QC GSM387798.CEL TGCII021/LAH GC-021LAH-T.CEL Fail QC GSM387790.CEL 980327 GC-980327T.CEL called as "adenosquamous cancer", relapse also adenosquamous GSM387937.CEL 2000619 GC-2000619T.CEL squamous CA GSM387844.CEL TGCII026/GJK GC-026-GJK-T.CEL GIST/squamous GSM387793.CEL TGCII039/TSC GC-039-TSC-T.CEL GIST/squamous GSM387799.CEL

幽门螺杆菌(Hp)感染、肠上皮化生、萎缩性胃炎

幽门螺杆菌(Hp)感染、肠上皮化生、萎缩性胃炎

Wednesday, October 26, 2011

Sunday, October 16, 2011

BFRM+NTP

top 1/3 vs bottom 1/3
use NTP and gene signature to predict the direction.

Thursday, October 13, 2011

human gastric precancerous lesions

http://www.weibing.com.cn/html/mxqbxwy/
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1766652/pdf/v045p000I5.pdf
http://www.google.com.sg/url?sa=t&source=web&cd=2&ved=0CC0QFjAB&url=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fpmc%2Farticles%2FPMC1146204%2F&ei=yNuWTtO4BsLsrAeSvqWbBA&usg=AFQjCNHwRAwFh8ipHDTFVBzNx-r7G-dbkg&sig2=kbm6v1zThHgCqzhCMyg13Q


chronic superficial gastritis (CSG, 16),
chronic atrophic gastritis (CAG, 16), intestinal metaplasia
(IM, 35), gastric epithelial dysplasia (GED, 23) and gastric
cancer (CA, 25), and conditions of H.pylori infection

Tuesday, October 11, 2011

http://en.wikipedia.org/wiki/Carcinogenesis

A new way of looking at carcinogenesis comes from integrating the ideas of developmental biology into oncology. The cancer stem cell hypothesis proposes that the different kinds of cells in a heterogeneous tumor arise from a single cell, termed Cancer Stem Cell. Cancer stem cells may arise from transformation of adult stem cells or differentiated cells within a body. These cells persist as a subcomponent of the tumor and retain key stem cell properties. They give rise to a variety of cells, are capable of self-renewal and homeostatic control.[25]Furthermore, the relapse of cancer and the emergence of metastasis are also attributed to these cells. The cancer stem cell hypothesis does not contradict earlier concepts of carcinogenesis.
A new way of looking at carcinogenesis comes from integrating the ideas of developmental biology into oncology. The cancer stem cell hypothesis proposes that the different kinds of cells in a heterogeneous tumor arise from a single cell, termed Cancer Stem Cell. Cancer stem cells may arise from transformation of adult stem cells or differentiated cells within a body. These cells persist as a subcomponent of the tumor and retain key stem cell properties. They give rise to a variety of cells, are capable of self-renewal and homeostatic control.[25]Furthermore, the relapse of cancer and the emergence of metastasis are also attributed to these cells. The cancer stem cell hypothesis does not contradict earlier concepts of carcinogenesis.

Wednesday, September 21, 2011

Export chart from Excel to Adobe illustrator

1. In Excel: copy the chart, go to another sheet, e.g. sheet2, copy special, "Picture (Enhanced metafile)"
2. Copy the chart in "sheet2" in Excel, paste to a new word document, save this word document as PDF
3. open the pdf by Illustrator.

Friday, September 9, 2011

Find SRE motif in promoter region

#!/usr/bin/perl
use lib 'E:/perl_lib';
use Bioinformatics;
use LWP::Simple;

my $genes = qq/MMP1 chr11:102,660,641-102,668,966 -
MMP2 chr16:55,513,081-55,540,584 +
MMP9 chr20:44,637,547-44,645,199 +
IL6 chr7:22,766,766-22,771,619 +
IL8 chr4:74,606,275-74,609,431 +
/;


my @genes = split(/\n/, $genes);
for (my $gene_i=0; $gene_i<=$#genes; $gene_i++)
{
$genes[$gene_i] =~ s/,//g;


if ($genes[$gene_i] =~/(\w+)\s(chr\d+):(\d+)-(\d+)\s(.*)/)
{
my $gene_name = $1;
my $gene_chr = $2;
my $gene_start = $3;
my $gene_end = $4;
my $strand = $5;

my $promoter_start;
my $promoter_end;

if ($strand eq "+")
{
$promoter_start = $gene_start - 2001;
$promoter_end = $gene_start - 1;
}
else
{
$promoter_start = $gene_end + 1;
$promoter_end = $gene_end + 2001;
}


my $seg = "$gene_chr:$promoter_start,".$promoter_end;
my $URL_gene ="http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=$seg";
my $genefile = get($URL_gene);
my @DNA=grep {
/^[acgt]*$/i;
} split("\n",$genefile);

my $DNA = '';
for (my $i=0; $i<=$#DNA; $i++)
{
$DNA .= $DNA[$i];
}
my $len = length($DNA);


if ($strand eq "-")
{
$DNA = lc(Bioinformatics::cStrand($DNA));
}

my $SRE_found = 0;
while( $DNA =~ /(cc[at]{5}gg)/)
{
$SRE_found = 1;
my $SRE = $1;
my $SRE_upper = uc($SRE);
$DNA =~ s/$SRE/----$SRE_upper----/;
}
if ($SRE_found)
{
print "$genes[$gene_i]: $DNA\n\n\n";
}
}
}

Tuesday, September 6, 2011

Sunday, August 21, 2011

How to dissect the tumors

1. check batch effect in SGIIA
(1) Heatmap, PCA on control genes
(2) Define the batches by global PCA
2. ComBat
3. CC_IFS on SGIIA
4. Select samples with avg_consensus_idx > 0.9
5. Use samples from step 4, and combine with SGIIB samples, repeat step 2-4
6. For new cohorts, repeat step 5.

limma to derive pairwise siganture, use NTP or SVM to predict.

Thursday, August 18, 2011

How to include the maximum number of tumors

How we have 201 tumor samples which have avg_consensus_idx > 0.9 in CC_IFS of ComBat248, and the new ComBat201 has cophenetic = 1.
Thus we can lower the avg_consensus_idx cutoff to include more samples until cophenetic < 1.


24% of M patients benifit from chemo.
30% of D patients may benifit from PI3K?
46% treatment unknow?

Monday, August 8, 2011

tophat


  104   nohup tophat -r 124 -o tophat124  --num-threads=4 hg19 AGS_1_sequence.fastq  AGS_2_sequence.fastq >screen.txt &

The estimated library average fragment size is 280, the read length is 60bp, so the inner distange between paired reads (--mate-inner-dist) is 160.

124=fragment size(insert size) - 2*read length

Tuesday, July 12, 2011

pvalues for affy probesets

library(affy)
setwd("/home/leiz/CEL/SGII192")
data <- ReadAffy()
eset_pma <- mas5calls(data)
pvalues <- assayDataElement(eset_pma, "se.exprs")
write.table(pvalues, "GC192.pvalues.txt", sep="\t")

Friday, July 8, 2011

glmnet cox

http://icb.med.cornell.edu/wiki/index.php/Elementolab/R_tutorial

Wednesday, June 29, 2011

http://www.cic.gc.ca/english/immigrate/skilled/complete-applications.asp

Wednesday, June 8, 2011

SCP without password

http://www.linuxjournal.com/article/8600
copy from your local machine to remote
1. At you local machine (e.g. Steve Server):
ssh-keygen -t rsa
#then enter, enter(default file)
all default (no phrase)
cd /home/leiz/.ssh

scp ~/.ssh/id_rsa.pub NUSSTF\\gmslz@172.25.138.12:/home/gmslz/.ssh/id_ras.pub.FromSteve

2. At remote machine (e.g. Cluster)
cd /home/gmslz/.ssh/
ls -la
cat id_ras.pub.FromSteve >>authorized_keys


DONE
you can scp from local (steve) to remote (cluster) without password



PS: get you local ip if no ifconfig in your local machine
netstat -an|grep "tcp"


172.25.136.25

Thursday, May 26, 2011

% With iterative feature selection, converged after three runs (consensus clustering)
file = 'E:\Projects\8.ComBAT\ComBat399T\CC_IFS\Run2\K3_consensus_matrix2.txt'


% No iterative feature selection
%file = 'E:\Projects\8.ComBAT\ComBat399T\CC_IFS\K3_consensus_matrix0.txt'


n=399
A = zeros(n, n);

fid = fopen(file, 'r');
row = 1;

% Skip first line
tline = fgetl(fid);




for row=1:n,
tline = fgetl(fid);
LineWith1stCol = regexp(tline, '\t', 'split');
A(row, :) = str2double(LineWith1stCol(1,2:n+1));
end




cd('E:\MATLAB_lib')
v=getcoph(A)





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function  [coph] = getcoph(a)
% a is the consensus matrix
[m,m]=size(a);
uvec=a(1,2:end);

for i=2:m-1;
uvec=[uvec a(i,i+1:end)]; %get upper diagonal elements of consensus
end

y=1-uvec;                 % consensus are similarities, convert to distances
z=linkage(y,'average');   % use average linkage
coph=cophenet(z,y);

end

Wednesday, May 25, 2011

Download youtube


Go to youtube URL for a video.
copy and paste the following to your chrome address bar 

22 1280x720
javascript:isIE=/*@cc_on!@*/false;isIE ? swfHTML=document.getElementById('movie_player').getElementsByTagName('param')[1].value:swfHTML=document.getElementById("movie_player").getAttribute("flashvars"); 
w=swfHTML.split("&"); for(i=0;i<=w.length-1;i++) if(w[i].split("=")[0] == "fmt_url_map"){links=unescape(w[i].split("=")[1]);break;}abc = links.split(",");for(i=0;i<=abc.length-1;i++){fmt=abc[i].split("|")[0];if(fmt==22){url = abc[i].split("|")[1];window.location.href = url + '&title=' + (((document.title.replace('#',' ')).replace('@',' ')).replace('*',' ')).replace('|',' ');}}


35 854x480

javascript:isIE=/*@cc_on!@*/false;isIE ? swfHTML=document.getElementById('movie_player').getElementsByTagName('param')[1].value:swfHTML=document.getElementById("movie_player").getAttribute("flashvars");
w=swfHTML.split("&"); for(i=0;i<=w.length-1;i++) if(w[i].split("=")[0] == "fmt_url_map"){links=unescape(w[i].split("=")[1]);break;}abc = links.split(",");for(i=0;i<=abc.length-1;i++){fmt=abc[i].split("|")[0];if(fmt==35){url = abc[i].split("|")[1];window.location.href = url + '&title=' + (((document.title.replace('#',' ')).replace('@',' ')).replace('*',' ')).replace('|',' ');}}

34 640x360
18 640x360
javascript:isIE=/*@cc_on!@*/false;isIE ? swfHTML=document.getElementById('movie_player').getElementsByTagName('param')[1].value:swfHTML=document.getElementById("movie_player").getAttribute("flashvars"); 
w=swfHTML.split("&"); for(i=0;i<=w.length-1;i++) if(w[i].split("=")[0] == "fmt_url_map"){links=unescape(w[i].split("=")[1]);break;}abc = links.split(",");for(i=0;i<=abc.length-1;i++){fmt=abc[i].split("|")[0];if(fmt==18){url = abc[i].split("|")[1];window.location.href = url + '&title=' + (((document.title.replace('#',' ')).replace('@',' ')).replace('*',' ')).replace('|',' ');}}


5 320x240
javascript:isIE=/*@cc_on!@*/false;isIE ? swfHTML=document.getElementById('movie_player').getElementsByTagName('param')[1].value:swfHTML=document.getElementById("movie_player").getAttribute("flashvars");w=swfHTML.split("&");for(i=0;i<=w.length-1;i++)if(w[i].split("=")[0] == "fmt_url_map"){links=unescape(w[i].split("=")[1]);break;}abc = links.split(",");for(i=0;i<=abc.length-1;i++){fmt=abc[i].split("|")[0];if(fmt==5){url = abc[i].split("|")[1] + '&title=' + (((document.title.replace('#',' ')).replace('@',' ')).replace('*',' ')).replace('|',' ');window.location.href = url;}}

Friday, May 20, 2011

Standardization

x <- matrix(1:21, ncol=7)

By row (gene)
std.x.by.row <- t(scale(t(x), scale=T))


By column (array)
std.x.by.col <- scale(x, scale=T)



Check the batch effect by date

date2col <- function(date.list)
{
clr.template = c("red", "orange", "yellow", "green", "cyan", "blue", "purple")
num.dates <- length(date.list)
clr.list <- vector()
clr.list[1] <- "red"
c.index <- 0
for (i in 2:num.dates) {
if(date.list[i] == date.list[i-1]) {
clr.list[i] = clr.list[i-1]
} else {
c.index <- c.index+1
clr.list[i] = clr.template[c.index %% 7+1]
}
}
return(clr.list)
}



setwd("E:\\CEL\\GastricCancer\\AU\\PM_data_new\\Gastric_Affy_files\\Tumors")
data <- read.table(file="AU_GC70.rma.txt", header=T, row.names=1)
data.ctrl <- data[54614:54675, ]
library("gplots")


#data <- t(scale(t(data.ctrl), scale=T)) #standardized by row(gene)
#data[data < -3] <- -3
#data[data > 3] <- 3

data <- sweep(data.ctrl, 1, apply(data.ctrl, 1, median)) #just median centered




my.color <- c("8/4/2004","8/4/2004","11/18/2004","11/18/2004","11/25/2004","11/25/2004","11/25/2004","11/25/2004","11/25/2004","11/25/2004","11/25/2004","11/26/2004","11/26/2004","11/26/2004","11/26/2004","11/26/2004","12/2/2004","12/2/2004","12/3/2004","12/3/2004","12/3/2004","12/3/2004","12/3/2004","12/3/2004","12/3/2004","12/3/2004","12/3/2004","1/14/2005","1/14/2005","1/14/2005","1/14/2005","1/14/2005","1/14/2005","2/17/2005","2/17/2005","2/17/2005","2/17/2005","2/17/2005","2/17/2005","2/17/2005","2/25/2005","2/25/2005","3/4/2005","3/4/2005","3/4/2005","3/18/2005","3/18/2005","3/23/2005","4/8/2005","4/8/2005","4/8/2005","4/8/2005","4/8/2005","4/8/2005","4/28/2005","4/28/2005","4/28/2005","4/28/2005","4/29/2005","4/29/2005","4/29/2005","4/29/2005","5/19/2005","5/24/2005","5/24/2005","6/22/2005","6/22/2005","6/22/2005","6/22/2005","6/22/2005")
#my.color <- rep("black",dim(data)[2])
my.color <- date2col(my.color)
hm<-heatmap.2(as.matrix(data), col=greenred(75), scale="none", dendrogram="none", Rowv= T, Colv=F, ColSideColors=my.color, key=TRUE, symkey=FALSE, density.info="none",trace="none", cexRow=0.75,cexCol=0.75)
pdf(file = "Batch_in_CtrlGenes.pdf", width=10, height=10)
#pdf(file = "Batch_in_CtrlGenes.pdf")
hm<-heatmap.2(as.matrix(data), col=greenred(75), scale="none", dendrogram="none", Rowv= T, Colv=F, ColSideColors=my.color, key=TRUE, symkey=FALSE, density.info="none",trace="none", cexRow=0.75,cexCol=0.75)
dev.off()


Wednesday, May 11, 2011

GSAA

High, mid, low activity, e.g. p53
mid vs low -->(GSEA)  ES
high vs low --> High ES??




Supposed we have 200 cell lines, obtain the expression before drug treatment, and obtain GI50 for M drugs.
(NCI60, too small?)



Drug1
Drug2
Drug3
... ...
DrugM
GeneSet1
Corr(1,1) = GSAA1vsGI50 for drug 1 across all cell lines
Corr(1,2) = GSAA1vsGI50 for drug2 across all cell lines



GeneSet2





GeneSet3





GeneSet4





... ...





GeneSetN







Tuesday, April 26, 2011

Median Centered

data.log10.MeanCentered <- sweep(data.log10, 2, apply(data.log10, 2, median))

Tuesday, April 19, 2011

Enrichment test in cancer subtypes





Invasive
Proliferative
Metabolic
Hypermethylated  
4110       
1536         
988   
Hypomethylated   
625        
1263         
284

Q1: Is the Invasive subtype enriched with Hypermethylated CpGs?
q <- 4110 # number of success (white balls) drawn
m <- 4110+1536+988 # number of success (white) in the urn
n <- 625+1263+284 # number of fail (black) in the urn
k <- 4110+625 # number of balls drawn out
p.value <- phyper(q-1,m,n,k, lower.tail=F)
# why q-1, because we want to have Pr(X>=x) instead of Pr(X>x)
p.value
= 2.975069e-162
YES
Conclusion: The Invasive subtype is significantly enriched with Hypermethylated CpGs.

Q2: Is the Invasive subtype enriched with Hypomethylated CpGs?
q <- 625 # number of success (white balls) drawn
m <- 625+1263+284 # number of fail (black) in the urn
n <- 4110+1536+988 # number of success (white) in the urn
k <- 4110+625 # number of balls drawn out
p.value <- phyper(q-1,m,n,k, lower.tail=F)
p.value
=1
No

Q3: Is the Proliferative subtype enriched with Hypermethylated CpGs?
q <- 1536
m <- 4110+1536+988
n <- 625+1263+284
k <- 1263+1536
p.value <- phyper(q-1,m,n,k, lower.tail=F)
p.value
 = 1
No



Q4: Is the Proliferative subtype enriched with Hypomethylated CpGs?
q <- 1263
m <- 625+1263+284
n <- 4110+1536+988
k <- 1263+1536
p.value <- phyper(q-1,m,n,k, lower.tail=F)
p.value
= 3.765696e-193
YES
Conclusion: The Proliferative subtype is significantly enriched with Hypomethylated CpGs.





Q5: Is the Metabolic subtype enriched with Hypermethylated CpGs?
q <- 988 # number of success (white balls) drawn
m <- 4110+1536+988 # number of success (white) in the urn
n <- 625+1263+284 # number of fail (black) in the urn
k <- 988+284 # number of balls drawn out
p.value <- phyper(q-1,m,n,k, lower.tail=F)
p.value
= 0.01919936
YES
Conclusion: The Metabolic subtype is significantly enriched with Hypermethylated CpGs.




Q6: Is the Metabolic subtype enriched with Hypomethylated CpGs?
q <- 284 #
m <- 625+1263+284
n <- 4110+1536+988
k <- 988+284
p.value <- phyper(q-1,m,n,k, lower.tail=F)
p.value

= 0.9839128
No






See also:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2724271/
An integrative genomics approach identifies Hypoxia Inducible Factor-1 (HIF-1)-target genes that form the core response to hypoxia