1

I have a data frame with one field that is a string containing a comma-separated list of names. I want to expand the data frame so that I have multiple rows from each original row, the number of rows being the number of names in the list. So, I want to change something like

df <- data.frame(f1=c("a","b"), f2=c("b","e"), f3=c("a,b,c", "a,d"))
df
f1  f2  f3
a   b   a,b,c
d   e   a,d

into

df
f1  f2  f3
a   b   a
a   b   b
a   b   c
d   e   a
d   e   d

I suspect that dplyr and/or reshape2 are the tools for the job, but I'm not sure how to apply them in this case.

Gregory
  • 4,147
  • 7
  • 33
  • 44

1 Answers1

1

Here's an approach with apply:

as.data.frame(do.call(rbind, apply(df, 1, function(x) {
  do.call(expand.grid, strsplit(x, ","))
})))
#   f1 f2 f3
# 1  a  b  a
# 2  a  b  b
# 3  a  b  c
# 4  b  e  a
# 5  b  e  d
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • Perhaps you could add your answer to the duplicate question so we can continue to point others there for the same question. – MrFlick Aug 25 '14 at 19:25
  • @MrFlick OK, I added my answer to the other question too. – Sven Hohenstein Aug 25 '14 at 19:37
  • Hey this is a really great solution, and much faster than the splitstackshape package functions,which also gave NAs, if the f3 column above is of different lengths. Now is there a way to create a progress bar for this, as I have a really large working dataset. Thanks a lot – sidpat Aug 28 '14 at 06:20