3

I was wondering if there was any way in R to split strings from multiple columns into multiple rows respectively. For example:

Split something from: V1 V2 V3 1 A,B,C C,D,F 2 X,Y,Z V,U,

Into V1 V2 V3 1 A C 1 B D 1 C F 2 X V .... 2 Z NA and so on. I was able to it for the first column, but the second column just prints duplicates of what is in the first column. I am using R so I can use either R syntax of SQLite syntax. Thank you!

Here is what I have so far:

split<- strsplit(as.character(Start), as.character(End), split= ";")

split1<-data.frame(id = rep(dataset$id, sapply(split, length)), End = unlist(split), End=unlist(split))
Chioma
  • 45
  • 5

1 Answers1

2

We can use separate_rows from tidyr in R

library(tidyr)
separate_rows(df1, V2, V3)
#   V1 V2 V3
#1  1  A  C
#2  1  B  D
#3  1  C  F
#4  2  X  V
#5  2  Y  U
#6  2  Z  T

separate_rows(df2, V2, V3)
#   V1 V2 V3
#1  1  A  C
#2  1  B  D
#3  1  C  F
#4  2  X  V
#5  2  Y  U
#6  2  Z   

Another option is cSplit

library(splitstackshape)
cSplit(df2, 2:ncol(df2), ",", "long")

data

df2 <- structure(list(V1 = 1:2, V2 = c("A,B,C", "X,Y,Z"), V3 = c("C,D,F", 
"V,U,")), .Names = c("V1", "V2", "V3"), class = "data.frame", 
 row.names = c(NA, -2L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @Chioma I am using `tidyr_0.6.0`. Please check if you have the updated tidyr or not. If not, update the package – akrun Jan 29 '17 at 05:29
  • Make sure you have the latest version of tidyr. `separate_rows` was introduced in 0.5.0 – G. Grothendieck Jan 29 '17 at 05:29
  • @akrun Okay thanks! How do I make it so that when the columns do not match, like in the last example I printed, that R prints "NA" or is just empty? – Chioma Jan 29 '17 at 05:38
  • @Chioma It gives me a blank value by default using the new example you showed – akrun Jan 29 '17 at 05:42
  • @Chioma I am not getting any error. Please check my update – akrun Jan 29 '17 at 05:44
  • @akrun could it be because in the example I gave I included a comma after the last input (V,U,) in reality there would not be a comma there. It would just be V,U Could that be why I am getting the error? – Chioma Jan 29 '17 at 05:49
  • @Chioma In that case, use the `cSplit`, it works with or without commas at the end – akrun Jan 29 '17 at 05:52
  • @akrun Thank you! For some reason, R is printing an extra amount of rows where both column 2 and column 3 have values NA. I'm not sure why but my observation has bumped up from around 3500 to 33000. I really appreciate all your help! I am in the process of learning and every step seems to be following by twice the errors. – Chioma Jan 29 '17 at 06:09
  • @Chioma Based on the examples you provided, it works fine for me. – akrun Jan 29 '17 at 06:11