2

Given the following sample dataset:

col1 <- c("X1","X2","X3|X4|X5","X6|X7")
col2 <- c("5","8","1","4")
dat <- data.frame(col1,col2)

How can I split the col1 by | and enter them as separate rows with duplicated col2 values? Here's the dataframe that I'd like to end up with:

col1 col2
  X1    5
  X2    8
  X3    1
  X4    1
  X5    1
  X6    4
  X7    4

I need a solution that can accomodate multiple columns similar to col2 that also need to be duplicated.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
Rnoob
  • 1,013
  • 1
  • 11
  • 12
  • Welcome to stackoverflow! As you are new on SO, please take some time to read [about Stackoverflow](http://stackoverflow.com/about) and [how to ask](http://meta.stackoverflow.com/help/how-to-ask). It is great that you have provide a [minimal, reproducible data set](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). However, it is also important that you show us what you have tried. I am sure many people out there (e.g. me) are much more willing to help if you share the code you have tried and explain where it went wrong. Thanks! – Henrik Sep 30 '13 at 21:44
  • 1
    You can try `concat.split.multiple` from my "splitstackshape" package: `library(splitstackshape); concat.split.multiple(dat, "col1", "|", "long")`. – A5C1D2H2I1M1N2O1R2T1 Oct 02 '13 at 05:36

1 Answers1

5

Just split the character string and then repeat the other columns based on the length.

y<-strsplit(as.character( dat[,1])  , "|", fixed=TRUE)
data.frame(col1= unlist(y), col2= rep(dat[,2], sapply(y, length)))
  col1 col2
1   X1    5
2   X2    8
3   X3    1
4   X4    1
5   X5    1
6   X6    4
7   X7    4

And if you need to repeat many columns except the first

data.frame(col1= unlist(y), dat[ rep(1:nrow(dat), sapply(y, length)) , -1 ] )
Chris S.
  • 2,185
  • 1
  • 14
  • 14