0

I have a dataset in the form of a data frame in R as follows

 col 1    col 2                 col3                            col 4
 941      3605                  c(0.035,0.298)                 20/08/2013 00:00:00
 929      3575                  c(0.026,0.078,0.292)           20/08/2013 00:00:00

I would like to split the list in column 3 and add it to the main data frame resulting in the following

 col 1    col 2       col3               col 4
 941      3605         0.035           20/08/2013 00:00:00
 941      3605         0.298           20/08/2013 00:00:00
 929      3575         0.026           20/08/2013 00:00:00
 929      3575         0.078           20/08/2013 00:00:00
 929      3575         0.0292          20/08/2013 00:00:00

Could somebody help with this?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
ArunK
  • 1,731
  • 16
  • 35
  • Check out `splitstackshape::listCol_l` – David Arenburg Mar 01 '16 at 11:54
  • Hi Danas, I had tried to use the process described in this post http://stackoverflow.com/questions/15930880/unlist-all-list-elements-in-a-dataframe it splits the dataset to columns, I couldn't get my head around how to re-jig it to the result I wanted. – ArunK Mar 01 '16 at 11:56
  • @DavidArenburg, Thanks! that's exactly what I was looking for! Cheers! – ArunK Mar 01 '16 at 12:03

2 Answers2

0

Something like this may help you

I understand col3 is a list... if not first do:

yourdata$col3 <- strsplit (yourdata$col3, ",")

Then find the length of each of the elements in col3

l <- sapply (yourdata$col3, length)

And the number of rows

N <- nrow (yourdata)

Create an "index" for your new dataset

my.new.rows <- rep (1:N, times = l) 

Create your new dataset

yornewdata <- yourdata[my.new.rows,]

And add the new col3

yornewdata[,"new3"] <- unlist (yourdata$col3)
dmontaner
  • 2,076
  • 1
  • 14
  • 17
0

Here is a solution using @DavidArenburg's suggested package {splitstackshape} and package {dplyr} to tidy up the transformation:

```

df <- data.frame(col_1 = c(941, 929), 
                 col_2 = c(3605,3575),
                 col_3 = I(list(c(0.035, 0.298),c(0.026, 0.078, 0.292))),
                 col_4 = c("0/08/2013 00:00:00", "20/08/2013 00:00:00"))

res <- splitstackshape::listCol_l(df, listcol = "col_3", drop = TRUE)


res <- dplyr::select(res, col_1, col_2, col_3_ul, col_4)
names(res)[3] <- "col_3"
print(res)

##   col_1 col_2 col_3               col_4
##1:   941  3605 0.035  0/08/2013 00:00:00
##2:   941  3605 0.298  0/08/2013 00:00:00
##3:   929  3575 0.026 20/08/2013 00:00:00
##4:   929  3575 0.078 20/08/2013 00:00:00
##5:   929  3575 0.292 20/08/2013 00:00:00

```