Zhengdeng Lei, PhD

Zhengdeng Lei, PhD

2009 - Present Research Fellow at Duke-NUS, Singapore
2007 - 2009 High Throughput Computational Analyst, Memorial Sloan-Kettering Cancer Center, New York
2003 - 2007 PhD, Bioinformatics, University of Illinois at Chicago

Sunday, December 23, 2012

[GC] - book chapter ISBN

http://link.springer.com/chapter/10.1007%2F11567752_3?LI=true

Wednesday, November 21, 2012

Drug naming

*****nib --- small molecule kinase inhibitor
*****mab --- antibody
*****stat --- HDAC inhibitor
*****mycin --- antibiotics

Wednesday, October 17, 2012

TCGA Breast Cancer

http://www.nature.com/nature/journal/v490/n7418/full/nature11412.html#/supplementary-information


Datasets:
https://tcga-data.nci.nih.gov/docs/publications/brca_2012/

Thursday, October 11, 2012

GEOmetadb


login Steve's web server
run ~/R.2.12.2.2/bin/R

library(GEOmetadb)
#getSQLiteFile()  #run this line if you want to update the GEOmetadb database file
file.info("GEOmetadb.sqlite")


con <- dbConnect(SQLite(), "GEOmetadb.sqlite")

cancer <- "astric"
treatment <- "luorouracil"
sql <- "select gse, title from gse where (title like '%CANCER%' or summary like '%CANCER%' or overall_design like '%CANCER%') and  (title like '%TREATMENT%' or summary like '%TREATMENT%' or overall_design like '%TREATMENT%') "
sql <- gsub("CANCER", cancer, sql)
sql <- gsub("TREATMENT", treatment, sql)

rs <- dbGetQuery(con, sql)
rs
dbDisconnect(con)

Tuesday, October 9, 2012

Induced pluripotent stem cell

check the gene expression of Oct-3/4, SOX2, c-Myc, and Klf4 in three subtypes

Monday, September 17, 2012

How to find the best K (# of clusters) in consensus clustering

It is the start point of the line, which drops the most sharply (i.e. max negative slope).
e.g. K=3 (in green circle)

Endnote Journal Style

http://endnote.com/downloads/styles

search by journal name

copy to C:\Program Files (x86)\EndNote X3\Styles

Friday, September 14, 2012

NSG Illumina pipeline screen output


Illumina 1.3+ fastq format: ASCII(min, max) = (66, 102)
2012/08/25 11:41:15 START maq ill2sanger Run1_testicular-28T_lane2_read1_sequence.txt Testis_T28_read1_sanger.fq
2012/08/25 11:42:57 SUCCESS after running 0 hours 1 minutes 42 seconds

2012/08/25 11:42:57 START maq ill2sanger Run1_testicular-28T_lane2_read2_sequence.txt Testis_T28_read2_sanger.fq
2012/08/25 11:44:37 SUCCESS after running 0 hours 1 minutes 40 seconds

2012/08/25 11:44:37 START fastx_quality_stats -Q 33 -i Testis_T28_read1_sanger.fq -o ./qcstats/Testis_T28_read1_sanger.fq_stats.txt
2012/08/25 11:49:13 SUCCESS after running 0 hours 4 minutes 36 seconds

2012/08/25 11:49:13 START fastq_quality_boxplot_graph.sh -i ./qcstats/Testis_T28_read1_sanger.fq_stats.txt -o ./qcstats/Testis_T28_read1_sanger.fq_stats.png -t "Testis_T28_read1_sanger.fq"
2012/08/25 11:49:13 SUCCESS after running 0 hours 0 minutes 0 seconds

2012/08/25 11:49:13 START fastx_quality_stats -Q 33 -i Testis_T28_read2_sanger.fq -o ./qcstats/Testis_T28_read2_sanger.fq_stats.txt
2012/08/25 11:53:51 SUCCESS after running 0 hours 4 minutes 38 seconds

2012/08/25 11:53:51 START fastq_quality_boxplot_graph.sh -i ./qcstats/Testis_T28_read2_sanger.fq_stats.txt -o ./qcstats/Testis_T28_read2_sanger.fq_stats.png -t "Testis_T28_read2_sanger.fq"
2012/08/25 11:53:51 SUCCESS after running 0 hours 0 minutes 0 seconds

2012/08/25 11:53:51 START bwa aln -t 8 /data/public/reference_genomes/genome_hg18.fa Testis_T28_read1_sanger.fq > ./bam/Testis_T28.read1.sai
2012/08/25 12:42:44 SUCCESS after running 0 hours 48 minutes 53 seconds

2012/08/25 12:42:44 START bwa aln -t 8 /data/public/reference_genomes/genome_hg18.fa Testis_T28_read2_sanger.fq > ./bam/Testis_T28.read2.sai
2012/08/25 13:33:10 SUCCESS after running 0 hours 50 minutes 26 seconds

2012/08/25 13:33:10 START bwa sampe /data/public/reference_genomes/genome_hg18.fa ./bam/Testis_T28.read1.sai ./bam/Testis_T28.read2.sai Testis_T28_read1_sanger.fq Testis_T28_read2_sanger.fq > ./bam/Testis_T28.sam
2012/08/25 13:50:34 SUCCESS after running 0 hours 17 minutes 24 seconds

2012/08/25 13:50:34 START samtools import /data/public/reference_genomes/genome_hg18.fa.fai ./bam/Testis_T28.sam ./bam/Testis_T28.bam
2012/08/25 14:03:10 SUCCESS after running 0 hours 12 minutes 36 seconds

2012/08/25 14:03:10 START samtools sort ./bam/Testis_T28.bam ./bam/Testis_T28.sorted
2012/08/25 14:25:58 SUCCESS after running 0 hours 22 minutes 48 seconds

2012/08/25 14:25:58 START samtools index ./bam/Testis_T28.sorted.bam
2012/08/25 14:26:42 SUCCESS after running 0 hours 0 minutes 44 seconds

2012/08/25 14:26:42 START java -Xmx4g -jar /home/gmslz/tools/picard-tools-1.44/MarkDuplicates.jar VALIDATION_STRINGENCY=SILENT M=PCR_duplicates.M.txt REMOVE_DUPLICATES=true AS=true I=./bam/Testis_T28.sorted.bam O=./bam/Testis_T28.sorted.rmdup.bam
2012/08/25 14:40:45 SUCCESS after running 0 hours 14 minutes 3 seconds

2012/08/25 14:40:45 START java -Xmx4g -jar /home/gmslz/tools/picard-tools-1.44/AddOrReplaceReadGroups.jar RGID=lane1 RGLB=Testis_T28.fastq RGPL=Illumina RGPU=run RGSM=Testis_T28 CREATE_INDEX=TRUE VALIDATION_STRINGENCY=SILENT I=./bam/Testis_T28.sorted.rmdup.bam O=./bam/Testis_T28.sorted.rmdup.rg.bam
2012/08/25 14:52:13 SUCCESS after running 0 hours 11 minutes 28 seconds

2012/08/25 14:52:13 START samtools index ./bam/Testis_T28.sorted.rmdup.rg.bam
2012/08/25 14:52:59 SUCCESS after running 0 hours 0 minutes 46 seconds

2012/08/25 14:53:00 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /data/public/tools/gatk-v4333/Sting/dist/GenomeAnalysisTK.jar -T RealignerTargetCreator -l INFO -D /data/nextgen1/pipeline/dbsnp_130_hg18.rod -R /data/public/reference_genomes/genome_hg18.fa -I ./bam/Testis_T28.sorted.rmdup.rg.bam -o ./snp_analysis_gatk/Testis_T28.intervals > ./log/Testis_T28.intervals.log
2012/08/25 16:29:57 SUCCESS after running 1 hours 36 minutes 57 seconds

2012/08/25 16:29:57 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /data/public/tools/gatk-v4333/Sting/dist/GenomeAnalysisTK.jar -T IndelRealigner -l INFO -D /data/nextgen1/pipeline/dbsnp_130_hg18.rod -R /data/public/reference_genomes/genome_hg18.fa -I ./bam/Testis_T28.sorted.rmdup.rg.bam -targetIntervals ./snp_analysis_gatk/Testis_T28.intervals -o ./bam/Testis_T28.realigned.bam -stats ./bam/Testis_T28.realign.stats.txt > ./log/Testis_T28.realign.log
2012/08/26 00:40:15 SUCCESS after running 8 hours 10 minutes 18 seconds

2012/08/26 00:40:15 START samtools index ./bam/Testis_T28.realigned.bam
2012/08/26 00:41:08 SUCCESS after running 0 hours 0 minutes 53 seconds

2012/08/26 00:41:08 START java -Xmx4g -jar /home/gmslz/tools/picard-tools-1.44/SortSam.jar SO=coordinate VALIDATION_STRINGENCY=SILENT I=./bam/Testis_T28.realigned.bam O=./bam/Testis_T28.realigned.sorted.bam
2012/08/26 00:52:36 SUCCESS after running 0 hours 11 minutes 28 seconds

2012/08/26 00:52:36 START samtools index ./bam/Testis_T28.realigned.sorted.bam
2012/08/26 00:53:21 SUCCESS after running 0 hours 0 minutes 45 seconds

2012/08/26 00:53:21 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /home/gmslz/tools/picard-tools-1.44/FixMateInformation.jar SO=coordinate VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 I=./bam/Testis_T28.realigned.sorted.bam O=./bam/Testis_T28.realigned.sorted.fixed.bam
2012/08/26 01:14:28 SUCCESS after running 0 hours 21 minutes 7 seconds

2012/08/26 01:14:28 START samtools index ./bam/Testis_T28.realigned.sorted.fixed.bam
2012/08/26 01:15:13 SUCCESS after running 0 hours 0 minutes 45 seconds

2012/08/26 01:15:13 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /data/public/tools/gatk-v4333/Sting/dist/GenomeAnalysisTK.jar -T CountCovariates -l INFO -nt 8 --downsample_to_coverage 10000 --default_platform Illumina -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -D /data/nextgen1/pipeline/dbsnp_130_hg18.rod -R /data/public/reference_genomes/genome_hg18.fa -I ./bam/Testis_T28.realigned.sorted.fixed.bam -recalFile ./snp_analysis_gatk/Testis_T28.recal.csv > ./log/Testis_T28.count.covariates.log
2012/08/26 01:25:00 SUCCESS after running 0 hours 9 minutes 47 seconds

2012/08/26 01:25:00 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /data/public/tools/gatk-v4333/Sting/dist/GenomeAnalysisTK.jar -T TableRecalibration -l INFO --default_platform Illumina -R /data/public/reference_genomes/genome_hg18.fa -I ./bam/Testis_T28.realigned.sorted.fixed.bam -recalFile ./snp_analysis_gatk/Testis_T28.recal.csv --out ./bam/Testis_T28.final.recalibrated.bam > ./log/Testis_T28.recalibration.log
2012/08/26 02:00:22 SUCCESS after running 0 hours 35 minutes 22 seconds

2012/08/26 02:00:22 START samtools index ./bam/Testis_T28.final.recalibrated.bam
2012/08/26 02:01:27 SUCCESS after running 0 hours 1 minutes 5 seconds

2012/08/26 02:01:27 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /data/public/tools/gatk-v4333/Sting/dist/GenomeAnalysisTK.jar -T IndelGenotyperV2 -l INFO --window_size 450 -D /data/nextgen1/pipeline/dbsnp_130_hg18.rod -R /data/public/reference_genomes/genome_hg18.fa -I ./bam/Testis_T28.final.recalibrated.bam -o ./snp_analysis_gatk/Testis_T28.indels.vcf -bed ./snp_analysis_gatk/Testis_T28.indels.bed > ./log/Testis_T28.indels.log
2012/08/26 02:40:46 SUCCESS after running 0 hours 39 minutes 19 seconds

2012/08/26 02:40:46 START python /data/public/tools/gatk-v4333/Sting/python/makeIndelMask.py ./snp_analysis_gatk/Testis_T28.indels.bed 10 ./snp_analysis_gatk/Testis_T28.indels.mask.bed
2012/08/26 02:40:46 SUCCESS after running 0 hours 0 minutes 0 seconds

2012/08/26 02:40:46 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /data/public/tools/gatk-v4333/Sting/dist/GenomeAnalysisTK.jar -T UnifiedGenotyper -l INFO -stand_call_conf 30 -stand_emit_conf 10 -pl SOLEXA -mmq 30 -mm40 3 -D /data/nextgen1/pipeline/dbsnp_130_hg18.rod -R /data/public/reference_genomes/genome_hg18.fa -I ./bam/Testis_T28.final.recalibrated.bam -o ./snp_analysis_gatk/Testis_T28.raw.snps.vcf > ./log/Testis_T28.snp.log
2012/08/26 04:48:58 SUCCESS after running 2 hours 8 minutes 12 seconds

2012/08/26 04:48:58 START java -Xmx4g -Djava.io.tmpdir=/home/tmp/ -jar /data/public/tools/gatk-v4333/Sting/dist/GenomeAnalysisTK.jar -T VariantFiltration -l INFO --clusterSize 3 --clusterWindowSize 10 --filterExpression "DP <= 8" --filterName "DP8" --filterExpression "SB > -0.10" --filterName "StrandBias" --filterExpression "HRun > 8" --filterName "HRun8" --filterExpression "QD < 5.0" --filterName "QD5" --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" --filterName "hard2validate" -D /data/nextgen1/pipeline/dbsnp_130_hg18.rod -R /data/public/reference_genomes/genome_hg18.fa -B:variant,VCF ./snp_analysis_gatk/Testis_T28.raw.snps.vcf -B:mask,Bed ./snp_analysis_gatk/Testis_T28.indels.mask.bed --maskName InDel -o ./snp_analysis_gatk/Testis_T28.snp.filtered.vcf > ./log/Testis_T28.snp.filter.log
2012/08/26 04:53:29 SUCCESS after running 0 hours 4 minutes 31 seconds

NGS Illumina pipeline main caller


#!/usr/bin/perl
use lib './';
use NGS_Illumina;


# Assume the paths for fastx_tk, maq, bwa, samtools are in ENV{PATH}; check $PATH setting $HOME/.bash_profile
# Assume perl and python are installed
my $NGS_obj = new NGS_Illumina ('sample_name' => 'Testis_T28',
'input_fq1' => 'Run1_testicular-28T_lane2_read1_sequence.txt',
'input_fq2' => 'Run1_testicular-28T_lane2_read2_sequence.txt',
'pe' => 1,
'reference_fa' => '/data/public/reference_genomes/genome_hg18.fa',
'dbsnp_rod' => '/data/nextgen1/pipeline/dbsnp_130_hg18.rod',
'target_bed' => '/data/nextgen1/pipeline/targets/SureSelect_All_Exon_G3362_with_names.v2.bed', #Agilent SureSelect Target Enrichment
'picard_dir' => '/home/gmslz/tools/picard-tools-1.44',
'gatk_dir' => '/data/public/tools/gatk-v4333/Sting',
'tmp_dir' => '/home/tmp',
'java_mem' => '-Xmx4g');

$NGS_obj -> check_fastq_format();
#$NGS_obj -> illumina2sanger();
#$NGS_obj -> qcstats();
#$NGS_obj -> fastq_trimmer(1, 74);
#$NGS_obj -> aln_bwa(); #alignment step, which includes bwa, sort, rm_pcrdup, add_rg
$NGS_obj -> snp_analysis_gatk(); #gatk step, which includes realignment, recalibration, genotyping (indel, snp)

NGS Illumina pipeline perl package


#!/usr/bin/perl
use lib "./";
################################################################
# NGS_Illumina Package                                         #
# Author:   Zhengdeng Lei                           #
# E-mail:   leizhengdeng@hotmail.com                           #
################################################################

# Example in caller NGS_Illumina.pl
# Assume the paths for fastx_tk, maq, bwa, samtools are in ENV{PATH}; check $PATH setting $HOME/.bash_profile
# Assume perl and python are installed
#my $NGS_obj = new NGS_Illumina ('sample_name' => 'Testis_T28',
# 'input_fq1' => 'Run1_testicular-28T_lane2_read1_sequence.txt',
# 'input_fq2' => 'Run1_testicular-28T_lane2_read2_sequence.txt',
# 'pe' => 1,
# 'reference_fa' => '/data/public/reference_genomes/genome_hg18.fa',
# 'dbsnp_rod' => '/data/nextgen1/pipeline/dbsnp_130_hg18.rod',
# 'picard_dir' => '/home/gmslz/tools/picard-tools-1.44',
# 'gatk_dir' => '/data/public/tools/gatk-v4333/Sting',
# 'tmp_dir' => '/home/tmp',
# 'java_mem' => '-Xmx4g');
#
#$NGS_obj -> check_fastq_format();
#$NGS_obj -> illumina2sanger();
#$NGS_obj -> qcstats();
#$NGS_obj -> fastq_trimmer(1, 74);
#$NGS_obj -> aln_bwa(); #alignment step, which includes bwa, sort, rm_pcrdup, add_rg
#$NGS_obj -> snp_analysis_gatk(); #gatk step, which includes realignment, recalibration, genotyping (indel, snp)

package NGS_Illumina;
#use strict;
use warnings;
use Time::Local;
use Time::localtime;

# private members:
# sample_name <sample name>
# input_fq1 <input fastq1>
# input_fq2 <input fastq2>
# sanger_fq1 <input fastq1 in sanger format>
# sanger_fq2 <input fastq1 in sanger format>
# reference_fa <reference genome>
# fastq_format <fastq format>
# "Sanger" -- quality score from 0 to 93 using ASCII 33 to 126
# "Solexa" Illumina 1.0 -- quality score from -5 to 62 using ASCII 59 to 126
# "Illumina" Illumina 1.3+ -- quality score from 0 to 62 using ASCII 64 to 126
# pe <paired end>
# 1 paired end
# 0 single end
#my @attributes = qw/reference_fa sample_name input_fq1 input_fq2 fastq_format sanger_fq1 sanger_fq2 pe/;

######################################################################################################
# The subroutines "new" and "initialize" for the constructor
sub new  
{
my $class_name = shift; # Gets class name from parmlist
my $this = {}; # Creates reference to object
bless($this, $class_name); # Associates reference with class
$this->initialize(@_); # Call local initialize subroutine
# passing rest of parmlist
return $this; # Explicit return of value of $this
}

sub initialize
{
my $this = shift;    # Receive an object reference as 1st param
my %parms = @_;         # Get passed parameters from the call to "new"

# Change hash reference for key with supplied value, or assign default
$this->{reference_fa} = $parms{reference_fa} || 'hg18.fa';
$this->{sample_name} = $parms{sample_name} || 'undef_sample_name';
$this->{input_fq1} = $parms{input_fq1} || 0;
$this->{input_fq2} = $parms{input_fq2} || 0;
$this->{pe} = $parms{pe} || 1;
$this->{fastq_format} = $parms{fastq_format} || 'Illumina';
$this->{sanger_fq1} = $parms{sample_name}.'_read1_sanger.fq';
$this->{sanger_fq2} = $parms{sample_name}.'_read2_sanger.fq';
$this->{dbsnp_rod} = $parms{dbsnp_rod} || '/data/nextgen1/pipeline/dbsnp_130_hg18.rod';
$this->{target_bed} = $parms{target_bed} || '/data/nextgen1/pipeline/targets/SureSelect_All_Exon_G3362_with_names.v2.bed';

$this->{java_mem} = $parms{java_mem} || '-Xmx4g';
$this->{num_thread} = $parms{num_thread} || 8;
$this->{gatk_dir} = $parms{gatk_dir} || '/data/public/tools/gatk-v4333/Sting';
$this->{picard_dir} = $parms{picard_dir} || '/home/gmslz/tools/picard-tools-1.44';
$this->{tools_dir} = $parms{tools_dir} || '/data/public/tools/gatk_pipeline/src-pipeline';
$this->{tmp_dir} = $parms{tmp_dir} || '/home/tmp';

$this->{picard_MarkDuplicates} = "java $this->{java_mem} -jar $this->{picard_dir}/MarkDuplicates.jar";
$this->{picard_AddOrReplaceReadGroups} = "java $this->{java_mem} -jar $this->{picard_dir}/AddOrReplaceReadGroups.jar";
$this->{picard_SortSam} = "java $this->{java_mem} -jar $this->{picard_dir}/SortSam.jar";
$this->{picard_FixMateInformation} = "java $this->{java_mem} -Djava.io.tmpdir=$this->{tmp_dir}/ -jar $this->{picard_dir}/FixMateInformation.jar";
$this->{gatk_GenomeAnalysisTK} = "java $this->{java_mem} -Djava.io.tmpdir=$this->{tmp_dir}/ -jar $this->{gatk_dir}/dist/GenomeAnalysisTK.jar";
$this->{gatk_makeIndelMask_py} = "$this->{gatk_dir}/python/makeIndelMask.py";

$this->{aligned_bam_file} = "./bam/$this->{sample_name}.sorted.rmdup.rg.bam"; #This is the final bam file after bwa alignment, i.e. before gatk
$this->{final_recalibrated_bam_file} = "./bam/$this->{sample_name}.final.recalibrated.bam"; #This is the final recalibrated bam file after after gatk

if ($this->{fastq_format} eq "Sanger")
{
$this->{sanger_fq1} = $this->{input_fq1};
$this->{sanger_fq2} = $this->{input_fq2};
}


#print $ENV{'PATH'}, "\n\n\n";

my @subfolders = qw/qcstats bam snp_analysis_gatk/;
foreach my $dir (@subfolders)
{
if (!(-e $dir))
{
mkdir($dir);
}
}
}

sub modify  #  modify the object
{
my $this = shift;  # Figure out who I am
my %parms = @_;       # Make hash out of rest of parm list
for(keys %parms)
{
$this->{$_} = $parms{$_}; # Replace with new value
}
}

sub snp_analysis_gatk
{
# GATK Documents:
# http://www.broadinstitute.org/gatk/gatkdocs/
# e.g.:
# http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_filters_DuplicateReadFilter.html
# http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_filters_VariantFiltration.html

my $this = shift;
#chdir("../snp_analysis_gtk");
my $max_depth = 10000;

# goto EVAL1;
run_cmd("$this->{gatk_GenomeAnalysisTK} -T DepthOfCoverage -l INFO ". #not support -nt $this->{num_thread}
"-D $this->{dbsnp_rod} -R $this->{reference_fa} -L $this->{target_bed} ".
"-I $this->{aligned_bam_file} ".
"-o ./snp_analysis_gatk/$this->{sample_name}.depthofcoverage > ./log/$this->{sample_name}.depthofcoverage.log");







#step 1 --- Local realignment around indels
#step 1.1 --- gatk -T RealignerTargetCreator; Determining (small) suspicious intervals around indels
run_cmd("$this->{gatk_GenomeAnalysisTK} -T RealignerTargetCreator -l INFO ". #not support -nt $this->{num_thread}
"-D $this->{dbsnp_rod} -R $this->{reference_fa} ".
"-I $this->{aligned_bam_file} ".
"-o ./snp_analysis_gatk/$this->{sample_name}.intervals > ./log/$this->{sample_name}.intervals.log");
#step 1.2 --- gatk -T IndelRealigner; Running the realigner over those intervals
run_cmd("$this->{gatk_GenomeAnalysisTK} -T IndelRealigner -l INFO ". #not support -nt $this->{num_thread}
"-D $this->{dbsnp_rod} -R $this->{reference_fa} ".
"-I $this->{aligned_bam_file} -targetIntervals ./snp_analysis_gatk/$this->{sample_name}.intervals ".
"-o ./bam/$this->{sample_name}.realigned.bam -stats ./bam/$this->{sample_name}.realign.stats.txt > ./log/$this->{sample_name}.realign.log");
run_cmd("samtools index ./bam/$this->{sample_name}.realigned.bam");
#step 1.3 --- picard SortSam
run_cmd("$this->{picard_SortSam} SO=coordinate VALIDATION_STRINGENCY=SILENT ".
"I=./bam/$this->{sample_name}.realigned.bam ".
"O=./bam/$this->{sample_name}.realigned.sorted.bam");
run_cmd("samtools index ./bam/$this->{sample_name}.realigned.sorted.bam");
#step 1.4 --- picard FixMateInformation; Fixing the mate pairs of realigned reads
run_cmd("$this->{picard_FixMateInformation} SO=coordinate VALIDATION_STRINGENCY=SILENT MAX_RECORDS_IN_RAM=2000000 ".
"I=./bam/$this->{sample_name}.realigned.sorted.bam ".
"O=./bam/$this->{sample_name}.realigned.sorted.fixed.bam");
run_cmd("samtools index ./bam/$this->{sample_name}.realigned.sorted.fixed.bam");
#step 1.5 --- skip picard MarkDuplicates


#step 2 --- Recalibrate quality scores --> Final Recalibrated Bam
#step 2.1 --- gatk -T CountCovariates;
run_cmd("$this->{gatk_GenomeAnalysisTK} -T CountCovariates -l INFO -nt $this->{num_thread} ".
"--downsample_to_coverage $max_depth --default_platform Illumina ".
"-cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate ".
"-D $this->{dbsnp_rod} -R $this->{reference_fa} ".
  "-I ./bam/$this->{sample_name}.realigned.sorted.fixed.bam ".
"-recalFile ./snp_analysis_gatk/$this->{sample_name}.recal.csv > ./log/$this->{sample_name}.count.covariates.log");
#step 2.2 --- gatk -T TableRecalibration  --> Final Recalibrated Bam
run_cmd("$this->{gatk_GenomeAnalysisTK} -T TableRecalibration -l INFO --default_platform Illumina ". #not support -nt $this->{num_thread}
"-R $this->{reference_fa} ".
  "-I ./bam/$this->{sample_name}.realigned.sorted.fixed.bam -recalFile ./snp_analysis_gatk/$this->{sample_name}.recal.csv ".
"--out $this->{final_recalibrated_bam_file} > ./log/$this->{sample_name}.recalibration.log");
run_cmd("samtools index $this->{final_recalibrated_bam_file}");
# skip 2.3 -T CountCovariates
# skip 2.4 AnalyzeCovariate

#############################################################################
# You may merge multiple final_recalibrated_bam files before Genotyping #
#############################################################################


#step 3 --- Genotyping
#step 3.1 --- gatk -T IndelGenotyperV2;
#print STDERR "\nRunning the indel genotyper:\n";
    run_cmd("$this->{gatk_GenomeAnalysisTK} -T IndelGenotyperV2 -l INFO --window_size 450 ". #not suport -nt $this->{num_thread}
"-D $this->{dbsnp_rod} -R $this->{reference_fa} ".
"-I $this->{final_recalibrated_bam_file} ".
"-o ./snp_analysis_gatk/$this->{sample_name}.indels.vcf -bed ./snp_analysis_gatk/$this->{sample_name}.indels.bed > ./log/$this->{sample_name}.indels.log");

# skip filter indels: perl /tools/filterSingleSampleCalls.pl
#step 3.2 python/makeIndelMask.py
# create intervals to mask snvs close to indels
# The output of the IndelGenotyper is used to mask out SNPs near indels.
# The number 10 in this command stands for the number of bases that will be included on either side of the indel
run_cmd("python $this->{gatk_dir}/python/makeIndelMask.py ./snp_analysis_gatk/$this->{sample_name}.indels.bed 10 ./snp_analysis_gatk/$this->{sample_name}.indels.mask.bed");
#step 3.3 --- gatk -T Unifiedgenotyper;
run_cmd("$this->{gatk_GenomeAnalysisTK} -T UnifiedGenotyper -l INFO ". #not suport -nt $this->{num_thread}
"-stand_call_conf 30 -stand_emit_conf 10 -pl SOLEXA -mmq 30 -mm40 3 ".
"-D $this->{dbsnp_rod} -R $this->{reference_fa} ".
"-I $this->{final_recalibrated_bam_file} ".
"-o ./snp_analysis_gatk/$this->{sample_name}.raw.snps.vcf > ./log/$this->{sample_name}.snp.log");

#step 3.4 --- gatk -T VariantFiltration; VariantFiltration is used to annotate suspicious calls from VCF files based on their failing given filters.
run_cmd("$this->{gatk_GenomeAnalysisTK} -T VariantFiltration -l INFO ". #not suport -nt $this->{num_thread}
'--clusterSize 3 --clusterWindowSize 10 '.
'--filterExpression "DP <= 8" --filterName "DP8" '.
'--filterExpression "SB > -0.10" --filterName "StrandBias" '.
'--filterExpression "HRun > 8" --filterName "HRun8" '.
'--filterExpression "QD < 5.0" --filterName "QD5" '.
'--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" --filterName "hard2validate" '.
"-D $this->{dbsnp_rod} -R $this->{reference_fa} ".
"-B:variant,VCF ./snp_analysis_gatk/$this->{sample_name}.raw.snps.vcf ".
"-B:mask,Bed ./snp_analysis_gatk/$this->{sample_name}.indels.mask.bed --maskName InDel ".
"-o ./snp_analysis_gatk/$this->{sample_name}.snp.filtered.vcf > ./log/$this->{sample_name}.snp.filter.log");

## grep -i 'indel'  Testis_T28.snp.filtered.vcf|more
## grep -i 'pass'  Testis_T28.snp.filtered.vcf|more

#EVAL1:
run_cmd("$this->{gatk_GenomeAnalysisTK} -T VariantEval -l INFO ". #not suport -nt $this->{num_thread}
"-D $this->{dbsnp_rod} -R $this->{reference_fa} -L $this->{target_bed} ".
#-G Indicates that your VCF file has genotypes; -A Print extended evaluation information; -V Print a list of interesting sites (FPs, FNs, etc).
"-B:eval,VCF ./snp_analysis_gatk/$this->{sample_name}.snp.filtered.vcf ".
"-o ./snp_analysis_gatk/$this->{sample_name}.snp.filtered.eval > ./log/$this->{sample_name}.snp.filter.eval.log");



#annotate variants Annovar
#$cmd = "annotate_variation.pl -filter -batchsize 50m -dbtype snp$verdbsnp -buildver $buildver -outfile $outfile $queryfile $dbloc";


# Consensus Quality: Phred-scaled likelihood that the genotype is wrong
# SNP Quality: Phred-scaled likelihood that the genotype is identical to the reference,
# which is also called `SNP quality'. Suppose the reference base is A and in alignment
# we see 17 G and 3 A. We will get a low consensus quality because it is difficult to
# distinguish an A/G heterozygote from a G/G homozygote. We will get a high SNP
# quality, though, because the evidence of a SNP is very strong.
# RMS: root mean square (RMS) mapping quality, a measure of the variance of quality scores
# Coverage: reads covering the position
}


########################################################################################################
# 0. check fastq format
sub check_fastq_format
{
my $this = shift;  # Figure out who I am
open(FASTQ, $this->{input_fq1}) or die "can't open $this->{input_fq1}: $@\n";

my $fastq_format;
    my $counter = 2000; # count 2000 lines of quality scores
my $min_qualscore = 1e6;
my $max_qualscore = -1e6;
    my $tmp;
    while ($counter > 0)
{
$tmp = <FASTQ>; # @read name
$tmp = <FASTQ>; # base calls
$tmp = <FASTQ>; # +read name
my $scores = <FASTQ>; # quality scores
if (!$scores) { last }
#print $scores;
chomp($scores);
for my $chr (map(ord, split(//, $scores)))
{
if ($chr < $min_qualscore)
{
$min_qualscore = $chr;
}
if ($chr > $max_qualscore)
{
$max_qualscore = $chr;
}

}
$counter--;
}
# Phred+33 means quality score + 33 = ASCII
if ($min_qualscore >= 33 && $max_qualscore <= 126)
{
$fastq_format = "Sanger";
}

# Solexa+64 means quality score + 64 = ASCII
if ($min_qualscore >= 59 && $max_qualscore <= 126)
{
$fastq_format = "Solexa";
}

# Phred+64 means quality score + 64 = ASCII
if ($min_qualscore >= 64 && $max_qualscore <= 126)
{
$fastq_format = "Illumina 1.3+";
}

# Phred+64 in both both Illumina 1.3+ and Illumina 1.5+
#if ($min_qualscore >= 66 && $max_qualscore <= 126)
#{
# $fastq_format = "Illumina 1.5+";
#}

# Illumina 1.8+ returned to the use of the Sanger format (Phred+33)
print "$fastq_format fastq format: ASCII(min, max) = ($min_qualscore, $max_qualscore)\n";
$this->{fastq_format} = $fastq_format;
if ($this->{fastq_format} eq "Sanger")
{
$this->{sanger_fq1} = $this->{input_fq1};
$this->{sanger_fq2} = $this->{input_fq2};
}
return $fastq_format;
}

########################################################################################################
# 1. QCGroomer step: illumina2sanger
# Alternative converter "fastq_quality_converter -i $this->{input_fq1} -o $this->{sanger_fq1}"
sub illumina2sanger
{
my $this = shift;  # Figure out who I am
my $fastq_converter;
if ($this->{fastq_format} eq "Sanger")
{
print "The input fastq is already in Sanger format, no need for conversion!!\n";
return;
}
elsif ($this->{fastq_format} eq "Solexa")
{
$fastq_converter = "maq sol2sanger";
}
elsif ($this->{fastq_format} eq "Illumina 1.3+")
{
$fastq_converter = "maq ill2sanger";
}
else
{
print "No Sanger or Illumina fastq format found, exit now!!\n";
exit(0);
}
run_cmd("$fastq_converter $this->{input_fq1} $this->{sanger_fq1}");

if($this->{pe})
{
run_cmd("$fastq_converter $this->{input_fq2} $this->{sanger_fq2}");
}

}


########################################################################################################
# 2. QCStats step: qcstats
# use fastx tool kit to process
# http://hannonlab.cshl.edu/fastx_toolkit/commandline.html
# check linux version by "uname -a"
# To set PATH=$PATH:$HOME/bin:$HOME/tools/fastx_tk
# vi .bash_profile
# source .bash_profile
sub qcstats
{
my $this = shift;
run_cmd("fastx_quality_stats -Q 33 -i $this->{sanger_fq1} -o ./qcstats/$this->{sanger_fq1}"."_stats.txt"); #option -Q 33 is for Sanger format
run_cmd("fastq_quality_boxplot_graph.sh -i ./qcstats/$this->{sanger_fq1}_stats.txt -o ./qcstats/$this->{sanger_fq1}_stats.png -t \"$this->{sanger_fq1}\"");
# my $qcnecleotide_cmd1 = "fastx_nucleotide_distribution_graph.sh -i $this->{sanger_fq1}"."_stats.txt -o ./qcstats/$this->{sanger_fq1}_nuc.png -t \"$this->{sanger_fq1} nucleotide distribution\"";
if($this->{pe})
{
run_cmd("fastx_quality_stats -Q 33 -i $this->{sanger_fq2} -o ./qcstats/$this->{sanger_fq2}"."_stats.txt"); #option -Q 33 is for Sanger format
run_cmd("fastq_quality_boxplot_graph.sh -i ./qcstats/$this->{sanger_fq2}_stats.txt -o ./qcstats/$this->{sanger_fq2}_stats.png -t \"$this->{sanger_fq2}\"");
# my $qcnecleotide_cmd2 = "fastx_nucleotide_distribution_graph.sh -i $this->{sanger_fq2}"."_stats.txt -o ./qcstats/$this->{sanger_fq2}_nuc.png -t \"$this->{sanger_fq2} nucleotide distribution\"";
}
}

sub fastq_trimmer
{
my $this = shift;
my ($first_base_pos, $last_base_pos) = @_;
print "reads at positions($first_base_pos, $last_base_pos) selected\n";
my $trimmed_fq1 = $this->{sanger_fq1};
$trimmed_fq1 =~ s/fq$/trimmed.fq/;
run_cmd("fastx_trimmer -Q 33 -f $first_base_pos -l $last_base_pos -i $this->{sanger_fq1} -o $trimmed_fq1");
$this->{sanger_fq1} = $trimmed_fq1;

if($this->{pe})
{
my $trimmed_fq2 = $this->{sanger_fq2};
$trimmed_fq2 =~ s/fq$/trimmed.fq/;
run_cmd("fastx_trimmer -Q 33 -f $first_base_pos -l $last_base_pos -i $this->{sanger_fq2} -o $trimmed_fq2");
$this->{sanger_fq2} = $trimmed_fq2;
}
}

sub aln_bwa
{
#    my (%sample_setting) = @_;
my $this = shift;
run_cmd("bwa aln -t $this->{num_thread} $this->{reference_fa} $this->{sanger_fq1} > ./bam/$this->{sample_name}.read1.sai");

if($this->{pe})
{
run_cmd("bwa aln -t $this->{num_thread} $this->{reference_fa} $this->{sanger_fq2} > ./bam/$this->{sample_name}.read2.sai");
#to add read group information -r '@RG\tID:foo\tSM:bar'
run_cmd("bwa sampe $this->{reference_fa} ./bam/$this->{sample_name}.read1.sai ./bam/$this->{sample_name}.read2.sai ".
"$this->{sanger_fq1} $this->{sanger_fq2} > ./bam/$this->{sample_name}.sam");
}
else
{
run_cmd("bwa samse $this->{reference_fa} ./bam/$this->{sample_name}.read1.sai $this->{sanger_fq1} > ./bam/$this->{sample_name}.sam");
}
# #//sed 's/[ \t]*$//g' $sample_name.sam >$sample_name.sed.sam
run_cmd("samtools import $this->{reference_fa}.fai ./bam/$this->{sample_name}.sam ./bam/$this->{sample_name}.bam");
run_cmd("samtools sort ./bam/$this->{sample_name}.bam ./bam/$this->{sample_name}.sorted");
run_cmd("samtools index ./bam/$this->{sample_name}.sorted.bam");

#scp -r ./picard-tools-1.44/ NUSSTF\\gmslz@172.25.138.143:~/tools/
#add $HOME/tools/picard-tools-1.44 to PATH in file .bash_profile
#source .bash_profile
##### remove PCR duplicate using PICARD
##### remove PRC duplicates by picard, use AS=true so that picard knows the bam is sorted.
run_cmd("$this->{picard_MarkDuplicates} VALIDATION_STRINGENCY=SILENT M=PCR_duplicates.M.txt REMOVE_DUPLICATES=true AS=true ".
"I=./bam/$this->{sample_name}.sorted.bam ".
"O=./bam/$this->{sample_name}.sorted.rmdup.bam");

#add read group inforation
run_cmd("$this->{picard_AddOrReplaceReadGroups} RGID=lane1 RGLB=$this->{sample_name}.fastq RGPL=Illumina RGPU=run RGSM=$this->{sample_name} CREATE_INDEX=TRUE VALIDATION_STRINGENCY=SILENT ".
"I=./bam/$this->{sample_name}.sorted.rmdup.bam ".
"O=./bam/$this->{sample_name}.sorted.rmdup.rg.bam");
run_cmd("samtools index ./bam/$this->{sample_name}.sorted.rmdup.rg.bam");

unlink("./bam/$this->{sample_name}.read1.sai", "./bam/$this->{sample_name}.read2.sai", "./bam/$this->{sample_name}.sam");
}



sub run_cmd
{
my $cmd = shift;
my $start_time = localtime();
my ($year, $mon, $day, $hour, $min, $sec) = ($start_time->year + 1900, $start_time->mon+1, $start_time->mday, $start_time->hour, $start_time->min, $start_time->sec);
my $start_time_sec = timelocal($sec, $min, $hour, $day, $mon, $year);
printf("%04d/%02d/%02d %02d:%02d:%02d", $year, $mon, $day, $hour, $min, $sec);
print "\tSTART $cmd\n";

my $cmd_res = system($cmd);
if ($cmd_res)
{
my $error_time = localtime;
($year, $mon, $day, $hour, $min, $sec) = ($error_time->year + 1900, $error_time->mon+1, $error_time->mday, $error_time->hour, $error_time->min, $error_time->sec);
printf("%04d/%02d/%02d %02d:%02d:%02d", $year, $mon, $day, $hour, $min, $sec);
print "\tERROR: $?\nEXIT!!\n\n";
exit 1;

}
else
{
my $end_time = localtime;
($year, $mon, $day, $hour, $min, $sec) = ($end_time->year + 1900, $end_time->mon+1, $end_time->mday, $end_time->hour, $end_time->min, $end_time->sec);
my $end_time_sec = timelocal($sec, $min, $hour, $day, $mon, $year);
my $difference = $end_time_sec - $start_time_sec;
my $seconds    =  $difference % 60;
$difference = ($difference - $seconds) / 60;
my $minutes    =  $difference % 60;
$difference = ($difference - $minutes) / 60;
my $hours      =  $difference % 24;
printf("%04d/%02d/%02d %02d:%02d:%02d", $year, $mon, $day, $hour, $min, $sec);
print "\tSUCCESS after running $hours hours $minutes minutes $seconds seconds\n\n";
}
}

1;


#
# Reference:
# http://www.bbmriwiki.nl/wiki/PipelineCommands
# http://www.bbmriwiki.nl/wiki/SnpCallingPipeline/VariantCalling
# http://www.broadinstitute.org/gatk/gatkdocs/

Monday, September 10, 2012

Wednesday, September 5, 2012

Build NTP predictor using order statistics

To output of limma
1. (for universal platform) convert to ENTREZ GENE ID, remove those with opposite direction, i.e. opposite t-score for same gene
2. calculate ranking.p by using p.val and abs(FC) based on order statistics
3. using -log10(ranking.p) as feature weight.

Monday, August 27, 2012

Target capture + Ion Proton Sequencer + barcoding

The batch effect is a big headache in high-throughput data (array or sequencing), especially when the data are generated in a long span of time. This includes detectable batch effect (e.g. by PCA, heatmap) and undetectable batch effect (noise).

But we can reduce the effect, for example, limit the running time within one week / one month, and maximize the running samples per day.

With the coming of  Target capture + Ion Proton Sequencer + barcoding, we can measure multiple samples in each run for targeting genes (e.g. subtype predictor genes), and this surely minimize the batch effect and make the prediction more accurate.

Saturday, July 28, 2012

TGFbeta


Cancer

In normal cells, TGF-β, acting through its signaling pathway, stops the cell cycle at the G1 stage to stop proliferation, induce differentiation, or promote apoptosis. When a cell is transformed into a cancer cell, parts of the TGF-β signaling pathway are mutated, and TGF-β no longer controls the cell. These cancer cells proliferate. The surrounding stromal cells (fibroblasts) also proliferate. Both cells increase their production of TGF-β. This TGF-β acts on the surrounding stromal cells, immune cells, endothelial and smooth-muscle cells. It causes immunosuppression and angiogenesis, which makes the cancer more invasive.[7] TGF-β also converts effector T-cells, which normally attack cancer with an inflammatory (immune) reaction, into regulatory (suppressor) T-cells, which turn off the inflammatory reaction.

Friday, July 27, 2012

various GF/GFR such as PDGF/PDGFR vs EGF/EGFR

Pls check various GF/GFR such as PDGF/PDGFR vs EGF/EGFR(HER1-4, ERBB1-4) in SG201

Thursday, July 26, 2012

PI3K in breast cancer

It is revealed that patients with ER+/HER2+ compared with ER-/HER2+ breast cancers may actually benefit more from drugs that inhibit the PI3K/AKT molecular pathway


http://en.wikipedia.org/wiki/ErbB
Estrogen Receptor Status of HER2+ Breast Cancer Correlates With Response to Anti-HER Therapies. ScienceDaily (May 6, 2010)[1]


http://www.sciencedaily.com/releases/2010/05/100506112557.htm

http://www.pnas.org/content/107/22/10208.short

Saturday, July 21, 2012

transcription factors-Pleiotropy

the existence of three subtypes of cancers may be due to malfunctioning of transcription factors, since a single type of TF can simultaneously affect the expression of a large cohort of downstream responder genes.
But metabolic-subtype may be an exception.

Wednesday, July 18, 2012

Run NTP


source("E:\\R_lib\\Microarray.R")
setwd("E:\\Projects\\David_Wnt\\BC170\\NTP_anyplatform\\GPL1223\\GSE1379")
gct.file <- "GSE1379_series_matrix.gct"
BC.Class.anyplatform(gct.file, src.dir=getwd(), output.dir=getwd())

Sunday, July 8, 2012

Cell Culture

http://www.microbiologybytes.com/video/culture.html

Oxygen in Stem Cell Biology: A Critical Component of the Stem Cell Niche


Summary

The defining hallmark of stem cells is their ability to self-renew and maintain multipotency. This capacity depends on the balance of complex signals in their microenvironment. Low oxygen tensions (hypoxia) maintain undifferentiated states of embryonic, hematopoietic, mesenchymal, and neural stem cell phenotypes and also influence proliferation and cell-fate commitment. Recent evidence has identified a broader spectrum of stem cells influenced by hypoxia that includes cancer stem cells and induced pluripotent stem cells. These findings have important implications on our understanding of development, disease, and tissue-engineering practices and furthermore elucidate an added dimension of stem cell control within the niche.

Wednesday, July 4, 2012

run.ComBat



run.ComBat <- function(exp.file, sample_info_file="batches.info.txt", src.dir=getwd())
{
setwd(src.dir)
if(!file.exists(sample_info_file)) {
print(paste(sample_info_file, " not found in", src.dir))
return()
}
gene.exp <- read.table(exp.file, header=T, sep="\t", row.names=1)

forcombat.exp.file <- sub("txt", "forcombat.txt", exp.file, ignore.case = T)
after.combat.exp.file <- sub("txt", "AFTER.Combat.txt", exp.file, ignore.case = T)

write.table(gene.exp, file=forcombat.exp.file, sep="\t", quote=F, row.names=F)
gene.id <- rownames(gene.exp)
ComBat(expression_xls=forcombat.exp.file, sample_info_file=sample_info_file, type='txt')
file.remove(forcombat.exp.file)

combat.out.file <- paste("Adjusted_", forcombat.exp.file, "_.xls", sep="")

combat.gene.exp <- read.table(combat.out.file, header=T, sep="\t")
rownames(combat.gene.exp) <- gene.id
file.remove(combat.out.file)
write.table(combat.gene.exp, file=after.combat.exp.file, sep="\t", quote=F, col.names=NA)
}

Order statistics for combining various genomic data sources



#Order statistics. The rankings from the separate data sources are combined using
#order statistics. A Q statistic is calculated from all rank ratios using the joint
#cumulative distribution of an N-dimensional order statistic as previously done
#by Stuart et al.31
Q <- function(r)
{
r <- sort(r)
N <- length(r)
if (N==1)
{
return (r)
}
s <- 0
for (i in 1:N)
{
del.element.idx <- N-i+1
if (N==i)
{
s = s+(r[N-i+1])*Q(r[-del.element.idx])

}else{
s = s+(r[N-i+1]-r[N-i])*Q(r[-del.element.idx])
}
}
s <- factorial(N)*s
return (s)
}

#r <- c(0.2, 0.05, 0.2)
#Q(r)


d<-matrix(c(sample(1:10),sample(1:10),sample(1:10),sample(1:10)), nrow=10, byrow=F)
rownames(d) <- c(paste("gene", seq(1:10), sep=""))
colnames(d) <- c("gene.exp", "methyl", "CNV", "Mutation")
d
r = d/10
ranking <- apply(r, 1, Q)
sort(ranking)





> d #the smaller the ranking, the more important of the feature, e.g. gene 1 and 3 have highest t-score/foldchange
       gene.exp methyl CNV Mutation
gene1         1      2   2        7
gene2         9      8   8        3
gene3         2      4   3        8
gene4        10      5   1        1
gene5         3     10   4        5
gene6         8      9   7       10
gene7         6      6   9        4
gene8         7      7   5        6
gene9         5      1   6        9
gene10        4      3  10        2
> r = d/10
> ranking <- apply(r, 1, Q)
> sort(ranking)
   gene1    gene4    gene3   gene10    gene9    gene8    gene5    gene7    gene2    gene6
  4.4640  10.7424  29.7216  41.2416  49.6224  68.1120  79.4016 108.7488 144.5472 268.3296
>

Monday, June 25, 2012

Get your google cache back

Method 1:

http://webcache.googleusercontent.com/search?q=cache:THE_WEB_URL


Method 2:
1. Save a bookmark with code below

javascript:window.open("http://webcache.googleusercontent.com/search?q=cache:"+encodeURIComponent(location.href))

2. go to the website and then click the saved bookmark, and you get the cache.

Tuesday, June 19, 2012

Batch effect


library(affy)

wk.dir <- "E:\\CEL\\BrainCancer\\U133B\\76samples"
rma.file <- "Brain.U133B.76.txt"

####################
# RMA
setwd(wk.dir)
data.rma<-justRMA()
write.exprs(data.rma, file=rma.file)


####################
# Check batch effect
source("E:\\R_lib\\Microarray.R")
get.files.info()
#!!!Edit the color manually in file "files.info.user.batch.txt"
check.batch.effect(rma.file)


####################
# RMA for each batch
rma.by.batch()

Thursday, June 14, 2012

BC170(TRANSBIG) vs BC_LCM

Overlap of signatures:


Inv:
1.211577_s_atinsulin-like growth factor 1 (so...
2.213605_s_athypothetical protein MGC22265
3.217430_x_atcollagen, type I, alpha 1
4.222380_s_atprogrammed cell death 6

T cell receptor????



Prol:
823_at
49306_at
55081_at
60474_at
200039_s_at
200052_s_at
200600_at
200628_s_at
200687_s_at
200755_s_at
200756_x_at
200757_s_at
200783_s_at
200790_at
200824_at
200827_at
200872_at
200934_at
201012_at
201030_x_at
201037_at
201088_at
201160_s_at
201161_s_at
201195_s_at
201196_s_at
201197_at
201201_at
201202_at
201215_at
201231_s_at
201250_s_at
201291_s_at
201292_at
201300_s_at
201387_s_at
201397_at
201433_s_at
201478_s_at
201479_at
201487_at
201555_at
201559_s_at
201564_s_at
201579_at
201587_s_at
201663_s_at
201664_at
201725_at
201774_s_at
201795_at
201820_at
201833_at
201853_s_at
201896_s_at
201897_s_at
201923_at
201930_at
201938_at
201968_s_at
201970_s_at
201976_s_at
201980_s_at
201983_s_at
201984_s_at
202035_s_at
202036_s_at
202037_s_at
202052_s_at
202074_s_at
202094_at
202095_s_at
202097_at
202107_s_at
202132_at
202133_at
202134_s_at
202146_at
202147_s_at
202188_at
202200_s_at
202233_s_at
202234_s_at
202236_s_at
202267_at
202269_x_at
202270_at
202307_s_at
202341_s_at
202342_s_at
202345_s_at
202412_s_at
202413_s_at
202431_s_at
202435_s_at
202446_s_at
202483_s_at
202504_at
202524_s_at
202580_x_at
202589_at
202613_at
202619_s_at
202625_at
202626_s_at
202637_s_at
202638_s_at
202643_s_at
202644_s_at
202647_s_at
202690_s_at
202705_at
202719_s_at
202760_s_at
202854_at
202870_s_at
202887_s_at
202902_s_at
202911_at
202932_at
202935_s_at
202951_at
202954_at
202965_s_at
203021_at
203022_at
203038_at
203074_at
203120_at
203126_at
203139_at
203145_at
203211_s_at
203213_at
203214_x_at
203256_at
203263_s_at
203276_at
203358_s_at
203362_s_at
203405_at
203418_at
203428_s_at
203510_at
203554_x_at
203560_at
203574_at
203612_at
203625_x_at
203636_at
203637_s_at
203645_s_at
203691_at
203692_s_at
203693_s_at
203702_s_at
203705_s_at
203706_s_at
203712_at
203744_at
203748_x_at
203755_at
203764_at
203780_at
203819_s_at
203820_s_at
203828_s_at
203856_at
203881_s_at
203909_at
203921_at
203964_at
204023_at
204030_s_at
204033_at
204060_s_at
204061_at
204087_s_at
204124_at
204126_s_at
204127_at
204146_at
204162_at
204170_s_at
204197_s_at
204198_s_at
204203_at
204222_s_at
204240_s_at
204244_s_at
204259_at
204268_at
204304_s_at
204318_s_at
204401_at
204437_s_at
204444_at
204510_at
204533_at
204580_at
204603_at
204641_at
204655_at
204695_at
204702_s_at
204709_s_at
204724_s_at
204725_s_at
204733_at
204750_s_at
204751_x_at
204767_s_at
204768_s_at
204806_x_at
204822_at
204825_at
204855_at
204913_s_at
204914_s_at
204915_s_at
204962_s_at
204992_s_at
205024_s_at
205044_at
205046_at
205047_s_at
205126_at
205157_s_at
205159_at
205199_at
205229_s_at
205240_at
205339_at
205347_s_at
205350_at
205363_at
205393_s_at
205394_at
205475_at
205485_at
205487_s_at
205548_s_at
205569_at
205585_at
205671_s_at
205681_at
205733_at
205778_at
205819_at
205860_x_at
206023_at
206033_s_at
206074_s_at
206082_at
206102_at
206157_at
206245_s_at
206276_at
206332_s_at
206364_at
206373_at
206391_at
206392_s_at
206513_at
206550_s_at
206560_s_at
206571_s_at
206632_s_at
206734_at
206999_at
207002_s_at
207030_s_at
207039_at
207165_at
207266_x_at
207571_x_at
207668_x_at
207719_x_at
207828_s_at
208103_s_at
208358_s_at
208433_s_at
208627_s_at
208628_s_at
208638_at
208639_x_at
208729_x_at
208767_s_at
208795_s_at
208941_s_at
208965_s_at
208966_x_at
209026_x_at
209056_s_at
209122_at
209125_at
209126_x_at
209155_s_at
209170_s_at
209191_at
209205_s_at
209211_at
209212_s_at
209283_at
209289_at
209290_s_at
209318_x_at
209373_at
209395_at
209396_s_at
209397_at
209406_at
209408_at
209421_at
209464_at
209535_s_at
209545_s_at
209642_at
209644_x_at
209669_s_at
209686_at
209709_s_at
209773_s_at
209791_at
209800_at
209822_s_at
209825_s_at
209834_at
209842_at
209868_s_at
209891_at
209900_s_at
209990_s_at
210015_s_at
210024_s_at
210029_at
210052_s_at
210073_at
210074_at
210087_s_at
210092_at
210145_at
210147_at
210153_s_at
210163_at
210164_at
210347_s_at
210381_s_at
210466_s_at
210559_s_at
210563_x_at
210692_s_at
210754_s_at
210785_s_at
210821_x_at
210845_s_at
210873_x_at
210933_s_at
210983_s_at
211042_x_at
211056_s_at
211063_s_at
211075_s_at
211084_x_at
211122_s_at
211126_s_at
211450_s_at
211466_at
211467_s_at
211519_s_at
211714_x_at
211725_s_at
211762_s_at
211767_at
211798_x_at
211911_x_at
211966_at
211967_at
212020_s_at
212021_s_at
212022_s_at
212023_s_at
212141_at
212142_at
212171_x_at
212174_at
212236_x_at
212247_at
212262_at
212263_at
212265_at
212274_at
212276_at
212295_s_at
212311_at
212314_at
212333_at
212345_s_at
212350_at
212371_at
212378_at
212397_at
212398_at
212501_at
212589_at
212590_at
212680_x_at
212771_at
212816_s_at
212922_s_at
212949_at
213029_at
213032_at
213060_s_at
213113_s_at
213122_at
213134_x_at
213137_s_at
213170_at
213226_at
213260_at
213310_at
213338_at
213457_at
213484_at
213523_at
213564_x_at
213680_at
213711_at
214038_at
214051_at
214104_at
214240_at
214431_at
214581_x_at
214596_at
214710_s_at
214838_at
214844_s_at
214845_s_at
215034_s_at
215049_x_at
215127_s_at
215223_s_at
215363_x_at
215719_x_at
215729_s_at
215942_s_at
215945_s_at
216237_s_at
216252_x_at
216640_s_at
216841_s_at
216952_s_at
217077_s_at
217294_s_at
217755_at
217834_s_at
217867_x_at
218019_s_at
218039_at
218051_s_at
218236_s_at
218238_at
218239_s_at
218308_at
218319_at
218336_at
218350_s_at
218355_at
218497_s_at
218499_at
218542_at
218564_at
218585_s_at
218618_s_at
218662_s_at
218663_at
218684_at
218726_at
218738_s_at
218755_at
218781_at
218796_at
218826_at
218847_at
218856_at
218868_at
218877_s_at
218886_at
218963_s_at
218984_at
218995_s_at
219000_s_at
219006_at
219010_at
219065_s_at
219148_at
219212_at
219225_at
219275_at
219306_at
219385_at
219439_at
219454_at
219489_s_at
219493_at
219497_s_at
219498_s_at
219555_s_at
219582_at
219588_s_at
219615_s_at
219654_at
219740_at
219787_s_at
219795_at
219806_s_at
219863_at
219867_at
219918_s_at
219959_at
219960_s_at
219974_x_at
220085_at
220147_s_at
220238_s_at
220239_at
220295_x_at
220386_s_at
220425_x_at
220432_s_at
220559_at
220624_s_at
220625_s_at
220651_s_at
220658_s_at
220840_s_at
220865_s_at
220892_s_at
220941_s_at
221004_s_at
221016_s_at
221059_s_at
221185_s_at
221203_s_at
221261_x_at
221436_s_at
221477_s_at
221505_at
221510_s_at
221520_s_at
221524_s_at
221591_s_at
221676_s_at
221677_s_at
221698_s_at
221830_at
221854_at
221872_at
221875_x_at
221881_s_at
221920_s_at
221922_at
222036_s_at
222037_at
222039_at
222062_at
222077_s_at
222158_s_at
222242_s_at

Meta:
59437_at
200670_at
200711_s_at
200811_at
201568_at
201596_x_at
202090_s_at
202109_at
202201_at
202263_at
202454_s_at
202489_s_at
202862_at
202908_at
203453_at
203771_s_at
204045_at
204295_at
204485_s_at
204567_s_at
204623_at
204667_at
205012_s_at
205081_at
205225_at
205472_s_at
205597_at
205768_s_at
205769_at
205879_x_at
206469_x_at
207142_at
207843_x_at
208682_s_at
208872_s_at
209004_s_at
209114_at
209123_at
209149_s_at
209173_at
209224_s_at
209309_at
209366_x_at
209459_s_at
209460_at
209623_at
209625_at
209665_at
209681_at
209696_at
209739_s_at
209740_s_at
209759_s_at
209917_s_at
210720_s_at
211421_s_at
212508_at
212510_at
212637_s_at
212956_at
213627_at
213846_at
214053_at
214404_x_at
215726_s_at
215867_x_at
216381_x_at
217014_s_at
217979_at
218025_s_at
218035_s_at
218048_at
218164_at
218195_at
218211_s_at
218640_s_at
218692_at
218976_at
219127_at
219872_at
220192_x_at
221874_at
221934_s_at
222125_s_at