I have an entry with commas. How to split into separate rows?

Question

I have a dataframe similar to below

df <- data.frame(var1=c('apple, bob, cat', 'b', 'c'), var2=c('d', 'e', 'f'))
df
             var1 var2
1 apple, bob, cat    d
2               b    e
3               c    f

I'm needing to split var1$1 into:

   var1 var2
1 apple    d
2   bob    d
3   cat    d
4     b    e
5     c    f

Such that var2 is duplicated.. I know how I can duplicate rows but am unsure if a nice way to split var1$1 into 3 rows. My df actually has many rows where similar to the "apple, bob, cat" issue above with as many as 20 different terms!

score 0 · Accepted Answer · answered Jun 15 '16 at 18:25

With the latest tidyr package (tidyr 0.5.0), you can do

df <- data.frame(var1=c('apple, bob, cat', 'b', 'c'), var2=c('d', 'e', 'f'))
tidyr::separate_rows(df, var1)
#     var2  var1
#   (fctr) (chr)
# 1      d apple
# 2      d   bob
# 3      d   cat
# 4      e     b
# 5      f     c

score 0 · Answer 2 · answered Jun 15 '16 at 18:35

Here is a base R method using the data.frame provided by @lukeA.

# split the variable by commas into a list
temp <- strsplit(as.character(df$var1), split=", ")
# form new data.frame
dfNew <- data.frame(var1=unlist(temp), var2=rep(df$var2, sapply(temp, length)))

The strsplit function splits up character vectors, here on ", " and returns a list. unlist returns a vector of each element of the list elements by list element. rep then repeats var2 by the length of each list element in temp.

Here is the output:

> dfNew
   var1 var2
1 apple    d
2   bob    d
3   cat    d
4     b    e
5     c    f

I have an entry with commas. How to split into separate rows?

2 Answers2