R: Use lapply with a function acting on part of a split dataframe

Question

This R code is setting up an example of the issue I am attempting to resolve. The data set measures a release of particles over non-uniform time intervals. The particle release is integrated over time using the trapezoid rule.

library(caTools)

test.data.frame <- data.frame( 
    sample = c('sample 1','sample 1','sample 1','sample 1',
               'sample 2','sample 2','sample 2','sample 2'))
test.data.frame$time <- c(1,2,4,6,1,4,5,6)
test.data.frame$material.released.g <- c(5,3,2,1,2,4,5,1)

split.test <- split(test.data.frame, test.data.frame$sample)

integrate.test <- function(x){
   dataframe.segment <- do.call(rbind.data.frame,x)
   return(trapz(dataframe.segment$time,dataframe.segment$material.released.g))
}

So far the integrate.test function appears to work on a single element of a list.

   > integrate.test(split.test[1])
   [1] 12

   > integrate.test(split.test[2])
   [1] 16.5

The lapply function gives zeros in the output.

  > lapply(split.test, integrate.test)
  $`sample 1`
  [1] 0

  $`sample 2`
  [1] 0

The output I am looking for is a data frame equivalent to:

expected.output <- data.frame(
    sample = c('sample 1','sample 2'), 
    total.material.released = c(12 , 16.5))

Is anyone able to help resolve the error code. Thanks!

This works: `lapply(split.test, function(x) integrate.test(list(x)))`, but you should understand the difference between `[` and `[[`. For instance, `integrate.test(split.test[[1]])` doesn't work (do you see why?). `lapply` subsets with `[[`. — nicola, Mar 07 '17 at 22:26
http://stackoverflow.com/questions/1169456/the-difference-between-and-notations-for-accessing-the-elements-of-a-lis ... for those stumped like me. — Agriculturist, Mar 07 '17 at 22:43

eipi10 · Accepted Answer · 2017-03-08T17:15:02.757

It's the difference between split.test[1], which is a one-element list containing a data frame, and split.test[[1]], which is the data frame stored in list element [[1]].

Your function, by calling do.call(rbind.data.frame, x), is expecting that x will be a list. But lapply(split.test, integrate.test) actually feeds it a data frame. Here's what happens when you feed integrate.test a data frame rather than a (generic) list:

x = do.call(rbind.data.frame, split.test[[1]])
x

                    c.1..1..5. c.1..2..3. c.1..4..2. c.1..6..1.
sample                       1          1          1          1
time                         1          2          4          6
material.released.g          5          3          2          1

do.call operates over a list. If you feed it a generic list (like split.test[1], which is a one-element list) it tries to rbind each list element. If the list contained several data frames, it would stack them into a single data frame. But there's only one element--the data frame contained in element 1 of split.test--so that's what gets returned.

However, when you run do.call(rbind, split.test[[1]]) you're giving do.call a data frame to operate on. A data frame is a special kind of list in which each column is a list element. So do.call takes the columns of your original data frame, transposes them into rows and stacks them. The integration returns 0, because the columns it wants to operate on no longer exist. When you reference those non-existent columns, values of NULL are returned instead of the data you were expecting and trapz(NULL, NULL) is zero.

The function will work if you use the data frame directly and skip the do.call step:

integrate.test <- function(x){
  #dataframe.segment <- do.call(rbind.data.frame,x)
  dataframe.segment = x
  return(trapz(dataframe.segment$time,dataframe.segment$material.released.g))
}

lapply(split.test, integrate.test)

$`sample 1`
[1] 12

$`sample 2`
[1] 16.5

Of course this can be shortened to:

integrate.test <- function(x){
  return(trapz(x$time,x$material.released.g))
}

Or you can just use trapz directly, without wrapping it in a function.

For completeness: e.o <- data.frame(sample = levels(test.data.frame$sample), total.material.released.g = unlist(lapply(split.test, function(x) trapz(x$time,x$material)))) — Agriculturist, Mar 07 '17 at 23:05

R: Use lapply with a function acting on part of a split dataframe

1 Answers1