How to flatten R data frame that contains lists?

Question

I want to find the best "R way" to flatten a dataframe that looks like this:

  CAT    COUNT     TREAT
   A     1,2,3     Treat-a, Treat-b
   B     4,5       Treat-c,Treat-d,Treat-e

So it will be structured like this:

   CAT   COUNT1  COUNT2 COUNT3  TREAT1   TREAT2   TREAT3
    A    1       2      3       Treat-a  Treat-b  NA 
    B    4       5      NA      Treat-c  Treat-d  Treat-e

Example code to generate the source dataframe:

df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5) 
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))

I believe I need a combination of rbind and unlist? Any help would be greatly appreciated. - Tim

cSplit() from the splitstackshape package would be a good option. — jazzurro, Dec 10 '15 at 16:02
With your example `df[2:3]<-lapply(df[,2:3], function(x) do.call(rbind,lapply(x,"[",1:3)))` seems a nice start. — nicola, Dec 10 '15 at 16:04
Very small data set so performance is not an issue - thanks! — Tim, Dec 10 '15 at 16:52
@AnandaMahto I had a quick look of the question and found that the question was asking a totally different thing next morning. It isn't good to be involved in SO just before going to bed... — jazzurro, Dec 18 '15 at 08:30
@jazzurro, I'll have to work on allowing multiple columns to be converted with `listCol_w` :-) — A5C1D2H2I1M1N2O1R2T1, Dec 18 '15 at 08:32
@AnandaMahto Yes, I actually thought that your package does have a function for this kind of issue, and yes that is the function! If you have spare time, please work on this project. :) — jazzurro, Dec 18 '15 at 09:29

score 11 · Answer 1 · edited May 23 '17 at 12:24

Here is a solution using base R, accepting vectors of any length inside your list and no need to specify which columns of the dataframe you want to collapse. Part of the solution was generated using this answer.

df2 <- do.call(cbind,lapply(df,function(x){
  #check if it is a list, otherwise just return as is
  if(is.list(x)){
    return(data.frame(t(sapply(x,'[',seq(max(sapply(x,length)))))))
  } else{
  return(x)
  }
}))

As of R 3.2 there is lengths to replace sapply(x, length) as well,

df3 <- do.call(cbind.data.frame, lapply(df, function(x) {
  # check if it is a list, otherwise just return as is
  if (is.list(x)) {
    data.frame(t(sapply(x,'[', seq(max(lengths(x))))))
  } else {
   x
 }
}))

data used:

df <- structure(list(CAT = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    COUNT = list(1:3, 4:5), TREAT = list(c("Treat-a", "Treat-b"
    ), c("Treat-c", "Treat-d", "Treat-e"))), .Names = c("CAT", 
"COUNT", "TREAT"), row.names = c(NA, -2L), class = "data.frame")

rawr · Accepted Answer · 2015-12-10T16:50:00.843

Here is another way in base r

df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5)
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))

Create a helper function to do the work

f <- function(l) {
  if (!is.list(l)) return(l)
  do.call('rbind', lapply(l, function(x) `length<-`(x, max(lengths(l)))))
}

Always test your code

f(df$TREAT)

#           [,1]      [,2]      [,3]     
# [1,] "Treat-a" "Treat-b" NA       
# [2,] "Treat-c" "Treat-d" "Treat-e"

Apply it

df[] <- lapply(df, f)
df

#     CAT COUNT.1 COUNT.2 COUNT.3 TREAT.1 TREAT.2 TREAT.3
#   1   A       1       2       3 Treat-a Treat-b    <NA>
#   2   B       4       5      NA Treat-c Treat-d Treat-e

And, then add one more `do.call(data.frame, ...)` on this. Their `list` has now been "flattened" to a `matrix`, but the number of columns is still 3. — A5C1D2H2I1M1N2O1R2T1, Dec 11 '15 at 10:18

A5C1D2H2I1M1N2O1R2T1 · Answer 3 · 2015-12-18T09:00:06.517

There's a deleted answer here that indicates that "splitstackshape" could be used for this. It can, but the deleted answer used the wrong function. Instead, it should use the listCol_w function. Unfortunately, in its present form, this function is not vectorized across columns, so you would need to nest the calls to listCol_w for each column that needs to be flattened.

Here's the approach:

library(splitstackshape)
listCol_w(listCol_w(df, "COUNT", fill = NA), "TREAT", fill = NA)
##    CAT COUNT_fl_1 COUNT_fl_2 COUNT_fl_3 TREAT_fl_1 TREAT_fl_2 TREAT_fl_3
## 1:   A          1          2          3    Treat-a    Treat-b         NA
## 2:   B          4          5         NA    Treat-c    Treat-d    Treat-e

Note that fill = NA has been specified because it defaults to fill = NA_character_, which would otherwise coerce all the values to character.

Another alternative would be to use transpose from "data.table". Here's a possible implementation (looks scary, but using the function is easy). Benefits are that (1) you can specify the columns to flatten, (2) you can decide whether you want to drop the original column or not, and (3) it's fast.

flatten <- function(indt, cols, drop = FALSE) {
  require(data.table)
  if (!is.data.table(indt)) indt <- as.data.table(indt)
  x <- unlist(indt[, lapply(.SD, function(x) max(lengths(x))), .SDcols = cols])
  nams <- paste(rep(cols, x), sequence(x), sep = "_")
  indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), .SDcols = cols]
  if (isTRUE(drop)) {
    indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), 
         .SDcols = cols][, (cols) := NULL]
  }
  indt[]
}

Usage would be...

Keeping original columns:

flatten(df, c("COUNT", "TREAT"))
#    CAT COUNT                   TREAT COUNT_1 COUNT_2 COUNT_3 TREAT_1 TREAT_2 TREAT_3
# 1:   A 1,2,3         Treat-a,Treat-b       1       2       3 Treat-a Treat-b      NA
# 2:   B   4,5 Treat-c,Treat-d,Treat-e       4       5      NA Treat-c Treat-d Treat-e

Dropping original columns:

flatten(df, c("COUNT", "TREAT"), TRUE)
#    CAT COUNT_1 COUNT_2 COUNT_3 TREAT_1 TREAT_2 TREAT_3
# 1:   A       1       2       3 Treat-a Treat-b      NA
# 2:   B       4       5      NA Treat-c Treat-d Treat-e

See this gist for a comparison with the other solutions proposed.

How to flatten R data frame that contains lists?

3 Answers3

Linked