I have two sets of data that I would like to investigate. The first is gene/genome related data given different 'cell-states'. The second set of data is relates the genes to a biological pathway. I believe my question is a relational db one.
'How can I show the data related from one dataframe and relate it to another. In other words, I want to graph the cell-state data and relate it to pathways and their specific genes. (I think in pictures so here goes.)
dataframe1-data from an affymetrix gene-chip
gene, cell-state1, cell-state2...
gene1, x1, y1,...
gene2, x2, y2,...
gene.x, ... ...
"1" "gene" "log_b" "log_b_rich" "Fc_cdt_rich_tot" "fc_Etoh_CDT_tot_mono" "fc_Etoh_CDT_tot_poly" "fc_Etoh_CDT_mono_poly" "fc_Etoh_Rich_tot_mono" "fc_Etoh_Rich_tot_poly" "fc_Etoh_Rich_mono_poly"
"2" "PHF13" -2.712616698 -1.47923545 -0.791138043 -0.549610558 0.143808182 0.69341874 0.320812876 1.089260116 0.76844724
"3" "SPSB1" -1.808348454 -1.965601198 -1.349135752 -0.780105329 0.410647447 1.190752776 0.587287796 1.260350195 0.673062399
dataframe2-data from the kegg db
pathway1, gene-x1, gene-x2, ...
pathway2, gene-y1, gene-y2, ...
pathway3, gene-z1, ...
"1" "KEGG_GLYCOLYSIS_GLUCONEOGENESIS" "PHF13" "LDHB" "LDHA" "PGAM1" "ADH1C" "PGAM2" "ADH1B" "ADH1A" "ACSS2" "PDHB" "ACSS1" "PGAM4" "PDHA2" "PDHA1" "LDHAL6B" "PFKL" "LDHAL6A" "FBP1" "PFKP" "ALDH3B2" "FBP2" "PFKM" "ALDH3B1" "PGM2" "G6PC" "ALDH7A1" "ALDH1B1" "PKM2" "PGM1" "DLD" "PKLR" "ALDH9A1" "ALDOA" "ALDOC" "ALDOB" "ADH5" "HK2" "HK1" "ADH6" "ADH7" "ALDH3A2" "G6PC2" "ALDH3A1" "GALM" "TPI1" "AKR1A1" "ADH4" "HK3" "ALDH1A3" "ENO2" "ENO3" "GAPDH" "ENO1" "BPGM" "DLAT" "PCK2" "PCK1" "GPI" "GCK" "ALDH2" "PGK1" "PGK2"
"2" "KEGG_CITRATE_CYCLE_TCA_CYCLE" "PHF13" "OGDHL" "OGDH" "PDHB" "IDH3G" "LOC283398" "IDH2" "IDH1" "PDHA2" "PDHA1" "SUCLA2" "FH" "DLST" "ACO2" "SUCLG2" "ACO1"
"PHF13" is highlighted to show relevance in each step.
What I want to do is, see if 'cell-state1' (in-)activates different genes / pathways from 'cell-state2.' Furthermore, I would like to test for correlation (t-test and maybe graphing) between the cell-states 1 Vs 2 for specific pathways.
My question is, which commands or method would allow me to do this most easily/efficiently: merge or using dummy variable?
HTH
Asked
Active
Viewed 156 times
1

mccurcio
- 1,294
- 5
- 25
- 44
-
3Please rephrase your question in such a way that it actually becomes a programming problem, and the problem itself is clear (including the structure of your data). What is gene-x1, ... what is cell-state, ... ? Give an example dataset so we actually have a clue. See also http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Joris Meys Jul 27 '11 at 12:37
1 Answers
0
What I want to do is, see if 'cell-state1' (in-)activates different genes pathways from 'cell-state2.'
This sounds like what you need is a factor-analysis. You could ask the good people of statistics.stackexchange.com about that.

Bernd Elkemann
- 23,242
- 4
- 37
- 66
-
I don't believe my question is necessarily stats but a relational db one. Maybe my question could be, 'How can I show the data related from one dataframe and relate it to another. I want to graph the cell-state data and relate it to the genes and pathways. – mccurcio Jul 27 '11 at 14:49