0

I have a large list that stored measurements (a product of other lapply() runs). I now want to gather these measurements and calculate median/mean/sd etc but I don't know how to access them. The structure of this list is like this:

foo[[i]][[j]][[k]][[1]]
foo[[i]][[j]][[k]][[2]]$bar

I can't figure out a function that would return e.g. mean of $bar (but not of $x) and keep relation the values of the indices i,j,k.

A sample list can be generated with the following R code:

library(purrr)

metrics <- function(y){

tt10r <- median(y)
list(y, flatten(list(bar = tt10r)))
}


example_list <- list()
for (i in 1:10)
{
  v <- list()
  for (j in 1:10)
  {
    w <- 1:10
    v[j] <- list(w)  
  }
example_list[[i]] <- v
}

foo <- list()
for (i in 1:length(example_list))
{
  u <- list()  
  values <- list()
  for (j in 1:length(example_list[[i]]))
  {
    u[[j]] <- lapply(example_list[[i]][[j]], function(x) mean(x))
    values[[j]] <- lapply(u[[j]], function(x) metrics(x))
  }
foo[[i]] <- values  
}
Geo Vogler
  • 63
  • 1
  • 8
  • 2
    Can you provide a small sample of your list of lists that would demonstrate this problem? – acylam Jun 18 '18 at 17:56
  • 1
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jun 18 '18 at 18:51
  • I added some code that would generate such a list. My original data is 4GB and therefore a tad too big to present. – Geo Vogler Jun 19 '18 at 18:55
  • The desired output would be a dataframe, with columns for i, j, k and the mean for all k per combination of i and j. – Geo Vogler Jun 20 '18 at 20:45

1 Answers1

0

The following code works nicely, but I am not sure if it is efficient (loops!). Gives the anticipated result:

final <- matrix(nrow = tail(cumsum(unlist(lapply(foo, function(x) lengths(x) -2))), n=1), ncol = 3) 
final <- data.frame(final)
j=1
i=1

all_js <- c(0, cumsum(lengths(foo)))

starts <- c(0, cumsum(unlist(lapply(foo, function(x) lengths(x) -2)))) + 1
ends <- c(0, cumsum(unlist(lapply(foo, function(x) lengths(x) -2))))

for (i in 1:length(foo))
{
  a <- foo[[i]]

  for (j in 1:length(a))
  {
    b <- a[[j]]

    data <- unlist(lapply(lapply(b[1], '[', 2), '[[', 1))

    for (k in 2:c(length(b)-2))
    {
      data <- rbind(data,unlist(lapply(lapply(b[k], '[', 2), '[[', 1)))
    }    
    row.names(data) <- NULL
    colnames(final) <- c("i", "j", colnames(data))

    first <- starts[all_js[i] + j]
    last <-  ends[all_js[i] + j+1]

    final[first:last,] <- data.frame(cbind(i = i, j = j, data))
  }    
}
Geo Vogler
  • 63
  • 1
  • 8