15

I want to find the best "R way" to flatten a dataframe that looks like this:

  CAT    COUNT     TREAT
   A     1,2,3     Treat-a, Treat-b
   B     4,5       Treat-c,Treat-d,Treat-e

So it will be structured like this:

   CAT   COUNT1  COUNT2 COUNT3  TREAT1   TREAT2   TREAT3
    A    1       2      3       Treat-a  Treat-b  NA 
    B    4       5      NA      Treat-c  Treat-d  Treat-e 

Example code to generate the source dataframe:

df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5) 
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))

I believe I need a combination of rbind and unlist? Any help would be greatly appreciated. - Tim

Tim
  • 929
  • 12
  • 30

3 Answers3

11

Here is a solution using base R, accepting vectors of any length inside your list and no need to specify which columns of the dataframe you want to collapse. Part of the solution was generated using this answer.

df2 <- do.call(cbind,lapply(df,function(x){
  #check if it is a list, otherwise just return as is
  if(is.list(x)){
    return(data.frame(t(sapply(x,'[',seq(max(sapply(x,length)))))))
  } else{
  return(x)
  }
}))

As of R 3.2 there is lengths to replace sapply(x, length) as well,

df3 <- do.call(cbind.data.frame, lapply(df, function(x) {
  # check if it is a list, otherwise just return as is
  if (is.list(x)) {
    data.frame(t(sapply(x,'[', seq(max(lengths(x))))))
  } else {
   x
 }
}))

data used:

df <- structure(list(CAT = structure(1:2, .Label = c("A", "B"), class = "factor"), 
    COUNT = list(1:3, 4:5), TREAT = list(c("Treat-a", "Treat-b"
    ), c("Treat-c", "Treat-d", "Treat-e"))), .Names = c("CAT", 
"COUNT", "TREAT"), row.names = c(NA, -2L), class = "data.frame")
Community
  • 1
  • 1
Heroka
  • 12,889
  • 1
  • 28
  • 38
10

Here is another way in base r

df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5)
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))

Create a helper function to do the work

f <- function(l) {
  if (!is.list(l)) return(l)
  do.call('rbind', lapply(l, function(x) `length<-`(x, max(lengths(l)))))
}

Always test your code

f(df$TREAT)

#           [,1]      [,2]      [,3]     
# [1,] "Treat-a" "Treat-b" NA       
# [2,] "Treat-c" "Treat-d" "Treat-e"

Apply it

df[] <- lapply(df, f)
df

#     CAT COUNT.1 COUNT.2 COUNT.3 TREAT.1 TREAT.2 TREAT.3
#   1   A       1       2       3 Treat-a Treat-b    <NA>
#   2   B       4       5      NA Treat-c Treat-d Treat-e
rawr
  • 20,481
  • 4
  • 44
  • 78
  • And, then add one more `do.call(data.frame, ...)` on this. Their `list` has now been "flattened" to a `matrix`, but the number of columns is still 3. – A5C1D2H2I1M1N2O1R2T1 Dec 11 '15 at 10:18
4

There's a deleted answer here that indicates that "splitstackshape" could be used for this. It can, but the deleted answer used the wrong function. Instead, it should use the listCol_w function. Unfortunately, in its present form, this function is not vectorized across columns, so you would need to nest the calls to listCol_w for each column that needs to be flattened.

Here's the approach:

library(splitstackshape)
listCol_w(listCol_w(df, "COUNT", fill = NA), "TREAT", fill = NA)
##    CAT COUNT_fl_1 COUNT_fl_2 COUNT_fl_3 TREAT_fl_1 TREAT_fl_2 TREAT_fl_3
## 1:   A          1          2          3    Treat-a    Treat-b         NA
## 2:   B          4          5         NA    Treat-c    Treat-d    Treat-e

Note that fill = NA has been specified because it defaults to fill = NA_character_, which would otherwise coerce all the values to character.


Another alternative would be to use transpose from "data.table". Here's a possible implementation (looks scary, but using the function is easy). Benefits are that (1) you can specify the columns to flatten, (2) you can decide whether you want to drop the original column or not, and (3) it's fast.

flatten <- function(indt, cols, drop = FALSE) {
  require(data.table)
  if (!is.data.table(indt)) indt <- as.data.table(indt)
  x <- unlist(indt[, lapply(.SD, function(x) max(lengths(x))), .SDcols = cols])
  nams <- paste(rep(cols, x), sequence(x), sep = "_")
  indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), .SDcols = cols]
  if (isTRUE(drop)) {
    indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), 
         .SDcols = cols][, (cols) := NULL]
  }
  indt[]
}

Usage would be...

Keeping original columns:

flatten(df, c("COUNT", "TREAT"))
#    CAT COUNT                   TREAT COUNT_1 COUNT_2 COUNT_3 TREAT_1 TREAT_2 TREAT_3
# 1:   A 1,2,3         Treat-a,Treat-b       1       2       3 Treat-a Treat-b      NA
# 2:   B   4,5 Treat-c,Treat-d,Treat-e       4       5      NA Treat-c Treat-d Treat-e

Dropping original columns:

flatten(df, c("COUNT", "TREAT"), TRUE)
#    CAT COUNT_1 COUNT_2 COUNT_3 TREAT_1 TREAT_2 TREAT_3
# 1:   A       1       2       3 Treat-a Treat-b      NA
# 2:   B       4       5      NA Treat-c Treat-d Treat-e

See this gist for a comparison with the other solutions proposed.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485