1

Please can someone help me with this problem? Any suggestion is greatly appreciated!

I started with:

A <- data.frame(stringsAsFactors = F)
A <- edit(A)

Then I filled in some values for A so it looks like this:

A
  var1  var2
1    a x,y,z
2    b   p,q
3    c   g,h

My goal is to get a data frame in this form:

  var1  var2
1    a     x
2    a     y
3    a     z
4    b     p
5    b     q  
6    c     g
7    c     h

This is how I tried to implement it:

A2 <- data.frame(stringsAsFactors = F)
for(i in 1:nrow(A)){
  if(grepl(",", A[i,2])){
    split <- unlist(strsplit(A[i,2], ","))

    for(j in 1:length(split)){
        newrow <- c(A[i,1],split[j])
        A2 <- rbind(A2, newrow)
    }
  }else{
    A2 <- rbind(A2, A[i,])
  }
}

But I get warning msgs:

Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = "y") :
  invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "z") :
  invalid factor level, NA generated
3: In `[<-.factor`(`*tmp*`, ri, value = "b") :
  invalid factor level, NA generated
4: In `[<-.factor`(`*tmp*`, ri, value = "p") :
  invalid factor level, NA generated
5: In `[<-.factor`(`*tmp*`, ri, value = "b") :
  invalid factor level, NA generated
6: In `[<-.factor`(`*tmp*`, ri, value = "q") :
  invalid factor level, NA generated
7: In `[<-.factor`(`*tmp*`, ri, value = "c") :
  invalid factor level, NA generated
8: In `[<-.factor`(`*tmp*`, ri, value = "g") :
  invalid factor level, NA generated
9: In `[<-.factor`(`*tmp*`, ri, value = "c") :
  invalid factor level, NA generated
10: In `[<-.factor`(`*tmp*`, ri, value = "h") :
  invalid factor level, NA generated
scat5218
  • 43
  • 1
  • 7
  • There are several recent Q&A which provide different ways to reach your goal: e.g. [**here**](http://stackoverflow.com/questions/24562786/separate-values-in-a-column-of-a-dataframe-and-melt/24562914#24562914), [**here**](http://stackoverflow.com/questions/24249351/can-i-split-this-dataframe-up-so-that-theres-a-new-row-for-each-item-in-some-co/24255502#24255502), and [**here**](http://stackoverflow.com/questions/24595421/how-to-strsplit-data-frame-column-and-replicate-rows-accordingly/24595552#24595552) – Henrik Jul 07 '14 at 23:16

1 Answers1

5

Your problem is that stringsAsFactors isn't a property that a data.frame remembers. That's only used during the initial data.frame creation and applies to all the values you pass as parameters. It does not at all affect future values you may add.

Also you're going to have problems rbinding to a data.frame with no columns. R likes to make sure that column names match up and such when using rbind and clearly this will not be the case. Plus when you are rbinding to a data.frame, it converts the object to a data.frame and then tries to add the value but this time you can't set stringsAsFactors so it uses the default (TRUE). You would to create your own data.frame explicitly with character columns. Here's one way you could re-write your loop

A2 <- data.frame(var1=character(), var2=character(), stringsAsFactors = F)
for(i in 1:nrow(A)){
  if(grepl(",", A[i,2])){
    split <- unlist(strsplit(A[i,2], ","))

    for(j in 1:length(split)){
        newrow <- c(var1=A[i,1],var2=split[j])
        A2 <- rbind(A2, data.frame(as.list(newrow), stringsAsFactors=F))
    }
  }else{
    A2 <- rbind(A2, A[i,])
  }
}

That being said, the cSplit helper function is useful for this type of stuff if you don't mind the dependency on data.table.

You might also do something like

A2 <- do.call(rbind, with(A, Map(expand.grid, 
     var1 = var1, 
     var2 = strsplit(var2, ",")
)))

with base functions to do the splitting and binding with base functions without needed a loop.

MrFlick
  • 195,160
  • 17
  • 277
  • 295