Performance of transcription factor identification tools from differential gene expression data

By Rick Stanton, Pathway Analysis Consultant

A three step process is a clear way to establish belief in the performance of transcription factor identification tools from differential gene expression data.

  1. The first step is to identify several types of differential gene expression data sets where the stimulus or trigger is clearly known.
  2. The second step is to identify the transcription factors most likely associated with the sets expression data.
  3. The third step is to perform an upstream analysis from the identified transcription factor.

If the transcription factor and upstream analysis tools can trace the signal cascade back to the stimulus, the tools are clearly producing relevant results, and belief in the performance of the analysis tools is established.

At this point, the tools can be directed with confidence to more challenging analyses such as developed resistance or pathway elucidation.

The performance of IPA’s new Transcription Factor and Upstream analysis tools was evaluated on the following datasets (processing details below):

  • TGFb stimulation, 1 hour, A549 lung adenocarcinoma cell line
  • BMP2 stimulation, 1 hour, Mouse Embryonic Stem Cell E14Tg2A.4 TNFa stimulation, 5 hours, human, HUVEC TNFa stimulation, 1 hour primary murine hepatocytes

For each of the above datasets, an upstream analysis from the identified transcription factors correctly identified the stimulus. IPA’s tools were very easy to use and the analysis time for the above experiments was less than one minute.

The performance, speed, and ease of use can only be characterized as very good, perhaps leading to breakthroughs when extended and used creatively.

Ingenuity’s new transcription factor analysis tool in IPA,  coupled with Ingenuity’s established upstream grow tools,  should be strongly considered for every lab analyzing differential expression data.

Note:  The BMP2 performance is really outstanding… astonishing really as BMP2 does not trigger a huge differential response at an early time point.

Experiment:
TGFb stimulation, 1 hour, A549 lung adenocarcinoma cell line
Array – Affymetrix Human Genome U133 Plus 2.0 Array
FoldChange, PVal        1.5          0.05
NMappedGenes          323
TGFb Predicted activation     Activated
TGFb Regulation z score rank 1st
TGFb P value of overlap rank 1st
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17708

Experiment:
BMP2 stimulation, 1 hour, Mouse Embryonic Stem Cell E14Tg2A.4
Array – Affymetrix Mouse Genome          430          2.0
FoldChange, PVal          1.2          0.05
NMappedGenes          96
BMP2 Predicted activation – Activated
BMP2 Regulation z score rank 1st
BMP2 P value of overlap rank 1st
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE17896

Experiment:
TNFa stimulation, 5 hours, human, HUVEC
Array –  Affymetrix Human Genome U133A
FoldChange, PVal          2.0          0.05
NMappedGenes          124
TNFa Predicted activation     Activated
TNFa Regulation z score rank 1st
TNFa P value of overlap rank 1st
http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE2639

Experiment:
TNFa stimulation, 1 hour primary murine hepatocytes
Array- Affymetrix Mouse Genome          430          2.0
FoldChange, PVal          1.4          0.05
NMappedGenes          208
TNFa Predicted activation –  Activated
TNFa Regulation z score rank 1st
TNFa P value of overlap rank 1st
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19272

The above datasets are publically available for download via the links provided.  Differential expression data was obtained from CEL files using the Matlab functions:  affyrma, genelowvalfilter, genevarfilter, mattest, and mavolcanoplot.

Click here to go to the Ingenuity.com home page.