1

I have a data frame that looks a lot like:

Sample   Start  End  Chr  Gene  Data
1        1      50   1    A     -0.2
1        60     84   1    B     -0.4
2        60     84   1    B     0.3
3        1      50   1    A     0.1

I'd really like to create a data frame that has this data displayed in a more easy to manage form, such as ...

Sample   GeneA   GeneB   GeneC ...
1        -0.2    -0.4    NA
2        NA      0.3     NA
3        0.1     NA      NA 

Is there a function that can help me this this ... so far I've managed to make an empty data frame with sample names and gene names listed as rows, but I'm struggling to work out how to correctly feed the data into this table.

Thanks for any advice for this query.

Natalie

  • `reshape2::dcast( data, Sample ~ Gene, value.var = 'Data', fun = function( x ) x, fill = NA_real_ )` or `data.table::dcast( data, Sample ~ Gene, value.var = 'Data', fun = function( x ) x, fill = NA_real_ )` – Sathish Feb 24 '17 at 12:11
  • Just as an aside, you might want to consider leaving the data as it is since it's already in _tidy_ form. i.e. each column is for a different variable, and each row for a different observation, as per ftp://cran.r-project.org/pub/R/web/packages/tidyr/vignettes/tidy-data.html and if it takes a while for you to get used to managing data like these, just ask more questions here if you need to :-) – amccnnll Feb 27 '17 at 07:19
  • Thanks @amccnnll - the link you posted is incredibly helpful! My reasoning for wanted to transpose the data is to perform Principle Component analysis, which from the online data I have seen requires a matrix similar to what I described in my question. Is there a better way of performing such analysis? – NatalieStephenson Feb 28 '17 at 10:29
  • Also @Jaap please could you link the existing question I have duplicated - I've searched but can't find anything giving me the information I need. Thanks! – NatalieStephenson Feb 28 '17 at 10:29

0 Answers0