0

I have this df:

   KEGGnumber         Cor             Colors
X1 C00095            -2.623973e-01    RED
X2 C17714, C00044    -2.241113e-01    RED
X3 C00033            -3.066684e-01    RED

and would like to format it as a two column dataframe with each individual KEGGnumber to be matched with its Color. It would look something like this:

KEGGnumber  Colors
C00095      RED
C17714      RED
C00044      RED
C00033      RED

Essentially, the new dataframe take the rows of the old dataframe with more than one KEGGnumber and splits them up, while keeping the same Color for each.

Zach Eisner
  • 114
  • 9

2 Answers2

1

This may or may not be a duplicate, but a very similar question can be found here: Splitting a string into new rows in R.

A simple adaptation of this example to your case would be:

library(splitstackshape)
library(data.table)
df2 <- as.data.frame(cSplit(as.data.frame(ls), "KEGGnumber",
                                     sep = ",", direction = "long"))

df2
  KEGGnumber        Cor Colors
1     c00095 -0.2623973    RED
2     c17714 -0.2241113    RED
3     c00044 -0.2241113    RED
4     c00033 -0.3066684    RED
Community
  • 1
  • 1
Mike H.
  • 13,960
  • 2
  • 29
  • 39
1

tidyr makes this quite easy:

library(tidyr)

df %>% separate_rows(KEGGnumber)
##          Cor Colors KEGGnumber
## 1 -0.2623973    RED     C00095
## 2 -0.2241113    RED     C17714
## 3 -0.2241113    RED     C00044
## 4 -0.3066684    RED     C00033

Chop off the Cor column if you like.

A less-pretty base option:

do.call(rbind, 
        Map(function(x, y){data.frame(KEGGnumber = x, Colors = y)}, 
            strsplit(as.character(df$KEGGnumber), ', '), 
            df$Colors))
##   KEGGnumber Colors
## 1     C00095    RED
## 2     C17714    RED
## 3     C00044    RED
## 4     C00033    RED
alistaire
  • 42,459
  • 4
  • 77
  • 117