I have a table (matrix, 50*50,000, first column is proble id, second column is geneid, other column for samples name (with expression value)). first column is unique (proble id, n=50,000) and second column is gene name which is repeated (n=20,000). eg. one gene has multiple probeid. I want to calculate average expression of all probe for particular gene i.e output matrix should be 50*20,000.
probeid genename sample1 sample2 sample3 ....n
1565483_at EGFR 4.231305 3.845882 4.182973
1565484_x_at EGFR 4.412553 4.279467 4.035834
201983_s_at EGFR 4.24823 4.304888 7.686607
201984_s_at EGFR 5.273041 4.914405 6.332343
210984_x_at EGFR 4.591761 4.807462 5.830411
211550_at EGFR 3.822476 4.055447 3.668374
211551_at EGFR 4.082551 3.825106 4.292482
211607_x_at EGFR 4.774399 4.566751 5.694684
224999_at EGFR 4.059136 4.307745 7.19947
201746_at TP53 7.847832 4.011214 7.834738
211300_s_at TP53 7.043846 4.881257 7.619734
Please give me R script or any unix command short-cut command.
Thanks