Looping across data frame to create new data frame

Question

I'm working on a solution to clean up some data, but am not 100% sure what the best solution is. I've found a working solution, but would like to know if there was an easier approach (especially in trying to scale it). What I would like to do is separate all the elements in a dataframe (separated by semicolons), apply each of those components to an element of the dataframe, then combine the results into a new dataframe. Example below:

test <- data.frame(class=c("a1", "a2","a3","a4"),
        person=c("p1;p3;p4","p2;p4","p4;p5;p6","p1;p5"),
        stringsAsFactors = F)

test1 <- c()
test2 <- c()

for (i in 1:nrow(test)){
    test1 <-append(test1, strsplit(test[i,2],";")[[1]])
    test2 <- append(test2, rep(test[i,1],length(strsplit(test[i,2],";")[[1]])))
}

score 1 · Answer 1 · answered Jun 09 '16 at 02:07

This slightly verbose oneliner will do it, if I understand exactly what you're trying to do:

do.call("rbind",apply(test, 1, function(x) expand.grid(x[1], unlist(strsplit(x[2], split=";")))))

   Var1 Var2
1    a1   p1
2    a1   p3
3    a1   p4
4    a2   p2
5    a2   p4
6    a3   p4
7    a3   p5
8    a3   p6
9    a4   p1
10   a4   p5

I am using expand.grid on each row of your original dataframe, including the first variable and the second variable split by ; as the inputs. As the result of apply is a list then I use do.call with rbind to put it back into a dataframe.

akrun · Accepted Answer · 2016-06-09T02:29:17.540

We can use cSplit from splitstackshape (here i mentioned one package and used one and not included another) to split the 'person' column by the delimiter ; and specify the direction as long to reshape to 'long' format after the split.

library(splitstackshape)
cSplit(test, 'person', ';', 'long')
#    class person
# 1:    a1     p1
# 2:    a1     p3
# 3:    a1     p4
# 4:    a2     p2
# 5:    a2     p4
# 6:    a3     p4
# 7:    a3     p5
# 8:    a3     p6
# 9:    a4     p1
#10:    a4     p5

Or another one-liner from base R (without using any packages)

stack(setNames(strsplit(test$person, ";"), test$class))[2:1]

score 1 · Answer 3 · answered Jun 09 '16 at 02:25

With tidyr:

library(dplyr)
library(tidyr)

         # separate person into multiple columns
test %>% separate(person, into = paste0('person', 1:5), fill = 'right') %>% 
    # gather from wide to long
    gather(key = id, value = person, -class, na.rm = TRUE) %>% 
    # clean up extra column
    select(-id)

#    class person
# 1     a1     p1
# 2     a2     p2
# 3     a3     p4
# 4     a4     p1
# 5     a1     p3
# 6     a2     p4
# 7     a3     p5
# 8     a4     p5
# 9     a1     p4
# 11    a3     p6

Looping across data frame to create new data frame

3 Answers3