0

I have a table (matrix, 50*50,000, first column is proble id, second column is geneid, other column for samples name (with expression value)). first column is unique (proble id, n=50,000) and second column is gene name which is repeated (n=20,000). eg. one gene has multiple probeid. I want to calculate average expression of all probe for particular gene i.e output matrix should be 50*20,000.

probeid genename sample1  sample2 sample3 ....n
1565483_at  EGFR 4.231305   3.845882    4.182973
1565484_x_at EGFR 4.412553  4.279467    4.035834
201983_s_at EGFR 4.24823    4.304888    7.686607
201984_s_at EGFR 5.273041   4.914405    6.332343
210984_x_at EGFR 4.591761   4.807462    5.830411
211550_at   EGFR 3.822476   4.055447    3.668374
211551_at   EGFR 4.082551   3.825106    4.292482
211607_x_at EGFR 4.774399   4.566751    5.694684
224999_at   EGFR 4.059136   4.307745    7.19947
201746_at   TP53    7.847832    4.011214    7.834738
211300_s_at TP53    7.043846    4.881257    7.619734

Please give me R script or any unix command short-cut command.

Thanks

mona
  • 101
  • 1
  • 2
  • 12
  • Do you really have a matrix or `data.frame` (as matrix can have only single class i.e. character if there is any element that is a character)? – akrun Jun 11 '16 at 15:30
  • can you explain "average expression of all probe"? – Chet Jun 11 '16 at 15:38
  • Sorry, average expression of a gene (using multiple probes for a gene) – mona Jun 11 '16 at 15:46
  • probelid is unique. how can you then get a 50*20000 records.. ? does it need to be omitted is 49 * 20000? can you add a sample output. – Chet Jun 11 '16 at 16:30

0 Answers0