3

In a separate question earlier today I asked how to flatten nested lists into a row in a dataframe. I wish to further understand how to manipulate my lists within a dataframe, this time by expanding the list vertically within the dataframe, adding new rows to accommodate the the data.

In this case I wish to go from this structure:

CAT    COUNT            TREAT
A      1,2,3            Treat-a, Treat-b
B      4,5              Treat-c,Treat-d,Treat-e

To this structure:

CAT   COUNT  TREAT
A     1      Treat-a   
A     2      Treat-b
A     3      NA 
B     4      Treat-c 
B     5      Treat-d
B     NA     Treat-e

Code to generate the test (source) data:

df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5) # as numbers
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))

I tried using CBIND in an approach similar to the answer supplied to my earlier question but it failed due to the different number of values between the multiple lists. Thank you for your patience as I attempt to grasp these basic manipulation tasks.

Community
  • 1
  • 1
Tim
  • 929
  • 12
  • 30
  • Tim, I know you're not after performance, but I'd be interested to know how the functions I've shared worked for your needs. Both the accepted answers to these related questions do things you might not want. For instance, the accepted answer here converts all columns to factors. The accepted answer to your other question simply converts the `list`s to matrices (check with `ncol` on the result to see how many columns you actually end up with. In other words, even though it looks like it had created a `data.frame` with 7 columns, the resulting `data.frame` only has 3 columns). Thanks! – A5C1D2H2I1M1N2O1R2T1 Dec 18 '15 at 19:28

2 Answers2

3

Here's what I came up with, using a helper function cbind.fill: (cbind a df with an empty df (cbind.fill?))

cbind.fill <- function(...){
  nm <- list(...) 
  nm <- lapply(nm, as.matrix)
  n <- max(sapply(nm, nrow)) 
  do.call(cbind, lapply(nm, function (x) 
    rbind(x, matrix(, n-nrow(x), ncol(x))))) 
}

#Split df by CAT
df.split <- split(df, df$CAT)

#Apply cbind.fill to make a matrix filled with NA where needed
rawlist <- lapply(df.split, function(x) cbind(as.character(x$CAT), cbind.fill(unlist(x$COUNT), unlist(x$TREAT) ) ))  

#Bind rows and convert matrix to data.frame
df.new <- as.data.frame(do.call(rbind, rawlist))

#Column names
colnames(df.new) <- names(df)

df.new
  CAT COUNT   TREAT
1   A     1 Treat-a
2   A     2 Treat-b
3   A     3    <NA>
4   B     4 Treat-c
5   B     5 Treat-d
6   B  <NA> Treat-e
Community
  • 1
  • 1
col. slade
  • 451
  • 4
  • 13
1

Extending my answer from your previous question, you can create another function, let's call this one flattenLong that combines flatten and melt from "data.table" to get your desired output.

The function looks like this:

flattenLong <- function(indt, cols) {
  ob <- setdiff(names(indt), cols)
  x <- flatten(indt, cols, TRUE)
  mv <- lapply(cols, function(y) grep(sprintf("^%s_", y), names(x)))
  setorderv(melt(x, measure.vars = mv, value.name = cols), ob)[]
}

Usage is simply:

flattenLong(df, c("COUNT", "TREAT"))
##    CAT variable COUNT   TREAT
## 1:   A        1     1 Treat-a
## 2:   A        2     2 Treat-b
## 3:   A        3     3      NA
## 4:   B        1     4 Treat-c
## 5:   B        2     5 Treat-d
## 6:   B        3    NA Treat-e

For convenience, here's the flatten function again:

flatten <- function(indt, cols, drop = FALSE) {
  require(data.table)
  if (!is.data.table(indt)) indt <- as.data.table(indt)
  x <- unlist(indt[, lapply(.SD, function(x) max(lengths(x))), .SDcols = cols])
  nams <- paste(rep(cols, x), sequence(x), sep = "_")
  indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), .SDcols = cols]
  if (isTRUE(drop)) {
    indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), 
         .SDcols = cols][, (cols) := NULL]
  }
  indt[]
}
Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485