0

I have a dataset where one row somtimes corresponds to two or more data points, as indicated by a comma seperation in one column. For example:

identifier         pos  name
ENSG00000208234    1    foo   
ENSG00000199674    5,8  bar    
ENSG00000221622    4    foobar

I want to expand this the following way

identifier         pos  name
ENSG00000208234    1    foo   
ENSG00000199674    5    bar
ENSG00000199674    8    bar    
ENSG00000221622    4    foobar 

Is there a way that does not involve iterating through each row and creating a new data.frame?

Thanks

  • 1
    Try: http://stackoverflow.com/questions/14226575/unpacking-a-factor-list-from-a-data-frame and http://stackoverflow.com/questions/14268908/expand-data-frame-with-a-split-in-r – Blue Magister Apr 30 '13 at 22:57

1 Answers1

0

Assuming X is your data.frame:

library(data.table)
DT <- data.table(X)

DT2 <- DT[, c(.SD, list(posv=strsplit(pos, ",")))]
DT2[, list(pos=unlist(posv)), by=list(identifier, name)]

note that if pos is factor you would first want to convert it to character:
DT[, pos := as.character(pos)]

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178