0

I am trying to expand a column within a dataframe. It is a list.

I have a table where one column (GO.Terms) contains 2 values separated by "//". I separated them using strsplit, resulting in a list (Goids). Now I want to 'melt' this list and have each in its separate row.

        > meta2go.test
   Pathways                 GO.Terms                    Goids
10 PWY-2464 GO:0005641 // GO:0005783 GO:0005641 ,  GO:0005783
11 PWY-2463 GO:0005641 // GO:0005783 GO:0005641 ,  GO:0005783
12 PWY-7556                                                  
13 PWY-5954               GO:0005829               GO:0005829

    > dput(meta2go.test)
structure(list(Pathways = structure(c(2L, 1L, 4L, 3L), .Label = c("PWY-2463", 
"PWY-2464", "PWY-5954", "PWY-7556"), class = "factor"), GO.Terms = structure(c(2L, 
2L, 1L, 3L), .Label = c("", "GO:0005641 // GO:0005783", "GO:0005829"
), class = "factor"), Goids = list(c("GO:0005641 ", " GO:0005783"
), c("GO:0005641 ", " GO:0005783"), character(0), "GO:0005829")), row.names = 10:13, class = "data.frame")

This is how I got the Goids column that is a list

meta2go$Goids<-strsplit(as.character(meta2go$GO.Terms),"//")

This is how I tried to rearrange

meta2go.tab<-do.call(rbind, 
        apply(meta2go.test, 1, 
              function(r) do.call(expand.grid, 
                                  c(unlist(r[-4]), 
                                    strsplit(as.character(r[4]), ", ")))))

However, the results are a bit messy with the quotes and concatenation still there. Any suggestions on how to do this more cleanly in the first place, or tidy it up. Thanks

    > head(meta2go.tab)
       pwys            GOid
10 PWY-2464 c("GO:0005641 "
11 PWY-2464  " GO:0005783")
12 PWY-2463 c("GO:0005641 "
13 PWY-2463  " GO:0005783")
16 PWY-6789 c("GO:0005576 "
17 PWY-6789  " GO:0005829")
user2814482
  • 631
  • 1
  • 10
  • 28
  • 1
    Also [This post talks about converting data frames from long to wide and vise-versa](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format) – Oliver Jul 26 '19 at 16:07
  • thanks for those links. I understand the basics about merge and long verus wide. However, mine is a bit different as I want to manipulate one column within a dataframe, that is a list. – user2814482 Jul 26 '19 at 16:29
  • 1
    Ah, that makes sense. I'd suggest adding a small `reproducible example` such as the first couple of rows in your original data. (similar to your last output) – Oliver Jul 26 '19 at 16:45
  • You can try this: `x <- cbind(x,str_split(x$GO.Terms,'GO:[0]{3}|//| ',simplify = T)); x %>% select(-Goids) %>% gather(k,v,-Pathways,-GO.Terms) %>% filter(v!='') %>% mutate(Goids=paste0('GO:000',v)) %>% select(-k,-v)` – kstew Jul 26 '19 at 22:58

0 Answers0