Format inhomogen data frame with multiple separators

Question

I have a data frame (df) read in with read.csv which looks like this:

  Reaction GID
1 A1       11
2 A2       21 / 22 / 23 / 24
3 A3       31 / 32
4 A4       41
5 A5       51 / 52 / 53

The data frame with column header has 2 columns and n rows, but the "GID" columns has strings with several entries separated by "/". As you can see, the strings have not the same length in every row. I want the columns separated by each "/"-entry and then melted to long format.

Thus it should look like:

  Reaction GID
1 A1       11
2 A2       21 
3 A2       22 
4 A2       23
5 A2       24
6 A3       31

and so on. I first applied the code from here: Multiple Separators for the same file input R

df2 <-do.call(rbind.data.frame,strsplit(df$GID," / "))

However, the lines, which have only one entry in "GID" also get multiplicated, where the duplicate entries should be left out.

  GID
1 11, 11, 11, 11
2 21, 22, 23, 24
3 31, 32, 31, 32

Thus, this approach recycles the entries and also the first "Reaction" column with the identifiers was left out, making a merging or matching back impossible.

This does not look like the correct way for me. What would be the correct method to reach my goal?

Here is a base r version in the spirit of your attempt `stack(setNames(strsplit(df$GID, " / "), df$Reaction))` — user20650, Jul 05 '16 at 21:29

score 1 · Accepted Answer · answered Jul 05 '16 at 16:08

1

We can use cSplit

library(splitstackshape)
cSplit(df, "GID", " / ", "long") 
#     Reaction GID
# 1:       A1  11
# 2:       A2  21
# 3:       A2  22
# 4:       A2  23
# 5:       A2  24
# 6:       A3  31
# 7:       A3  32
# 8:       A4  41
# 9:       A5  51
#10:       A5  52
#11:       A5  53

data

df <- structure(list(Reaction = c("A1", "A2", "A3", "A4", "A5"),
 GID = c("11", 
 "21 / 22 / 23 / 24", "31 / 32", "41", "51 / 52 / 53")), 
.Names = c("Reaction", "GID"), class = "data.frame",
  row.names = c("1", "2", "3", "4", "5"))

answered Jul 05 '16 at 16:08

akrun

874,273
37
540
662

1

Wow, that is a really simple easy command. Thank you very much. I was just about to use two for-loops, several ifs and some rbinds. ;-) – Rockbar Jul 05 '16 at 19:22
1

Have done that, sorry forgot it. t worked out, as intended. Thanks again! – Rockbar Jul 06 '16 at 08:25

Format inhomogen data frame with multiple separators

1 Answers1

data