1

I feel like this should have a really simple/elegant solution but I just can't find it. (I'm relatively new to r so that's no surprise.)

I have a (large) nested list containing data.frames that I'm trying to add together. Here is code to create some sample data:

#Create data frames nested in a list
for (i in 1:6) {
  for (j in 1:4) {
    assign(paste0("v", j), sample.int(100,4))
  }
  assign(paste0("df", i), list(cbind(v1, v2, v3, v4)))
}

inner1 <- list(data1 = df1, data2 = df2)
inner2 <- list(data1 = df3, data2 = df4)
inner3 <- list(data1 = df5, data2 = df6)

outer <- list(group1 = inner1, group2 = inner2, group3 = inner3)

I need to add all the data frames labeled data1 together and all the data2's together. If they weren't in this nested list format, I'd do this:

data1.tot <- df1 + df3 + df5
data2.tot <- df2 + df4 + df6

Because they are in a list, I thought there might be an lapply solution and tried:

grp <- c("group1", "group2", "group3") #vector of groups to sum across
datas <- lapply(outer, "[[", "data1") #select "data1" from all groups
tot.datas <- lapply(datas[grp], "+") #to sum across selected data
#I know these last two steps can be combined into one but it helps me keep everything straight to separate them

But it returns Error in FUN(left): invalid argument to unary operator because I'm passing the list of datas as x.

I've also looked at other solutions like this one: Adding selected data frames together, from a list of data frames

But the nested structure of my data makes me unsure of how to translate that solution to my problem.

And just to note, the data I'm working with are GCHN Daily data, so the structure is not my design. Any help would be greatly appreciated.

UPDATE: I've partially figured out a fix using the suggestion of Reduce by @Parfait, but now I need to automate it. I'm working on a solution using a for loop because that gives me more control over the elements I'm accessing, but I'm open to other ideas. Here is the manual solution that works:

get.df <- function(x, y, z) {
# function to pull out the desired data.frame from the list
# x included as argument to make function applicable to my real data
  output <- x[[y]][[z]]
  output[[1]]
}

output1 <- get.df(x = outer, y = "group1", z = "data1")
output2 <- get.df(x = outer, y = "group2", z = "data1")
data1 <- list(output1, output2)
data1.tot <- Reduce(`+`, data1)

Using my sample data, I'd like to loop this over 2 data types ("data1" and "data2") and 3 groups ("group1", "group2", "group3"). I'm working on a for loop solution, but struggling with how to save output1 and output2 in a list. My loop looks like this right now:

dat <- c("data1", "data2")
grp <- c("group1", "group2", "group3")

for(i in 1:length(dat)) {
  for(j in 1:length(grp)) {
    assign(paste0("out", j), get.df(x = outer, y = grp[j], z = dat[i]))
  }
list(??? #clearly this is where I'm stuck!
}

Any suggestions either on the for loop problem, or for a better method?

ESELIA
  • 132
  • 1
  • 12
  • how about `data1.tot <- df1[[1]] + df3[[1]] + df5[[1]]`. Is that what you are looking for? – Shree Sep 25 '18 at 23:31
  • Thanks. But there are 158 x 14 df's in total, so I'm looking for a solution where I don't have to type them all out. – ESELIA Sep 26 '18 at 02:36
  • I've made some progress using @Parfait's suggestion of `Reduce` but still am having some issues. I posted an update above. – ESELIA Sep 26 '18 at 04:26
  • To me it is still unclear what the final output shoud look like. Is it a list? Is it single sums? Please provide a hard coded output so that we know what you are aiming for. – Joe Sep 26 '18 at 07:32

3 Answers3

1

Consider Reduce which work off of lists. This higher order function is a compact way to run nested calls: ((df1 + df2) + df3) + ....

data1.tot <- Reduce(`+`, lapply(outer, "[[", "data1"))

data2.tot <- Reduce(`+`, lapply(outer, "[[", "data2"))

To demonstrate with random data

Data

set.seed(9262018)

dfList <- setNames(replicate(6, data.frame(NUM1=runif(50),
                                           NUM2=runif(50),
                                           NUM3=runif(50)), simplify = FALSE),
                   paste0("df", 1:6))

list2env(dfList, .GlobalEnv)

inner1 <- list(data1 = df1, data2 = df2)
inner2 <- list(data1 = df3, data2 = df4)
inner3 <- list(data1 = df5, data2 = df6)

outer <- list(group1 = inner1, group2 = inner2, group3 = inner3)

Output

data1.tot <- Reduce(`+`, lapply(outer, "[[", "data1"))
head(data1.tot, 10)
#         NUM1      NUM2      NUM3
# 1  2.0533870 1.3821609 1.0702992
# 2  2.6046584 1.7260646 1.9699774
# 3  2.2510810 1.6690353 1.4495476
# 4  1.7636879 1.2357098 1.9483906
# 5  1.0189969 2.1191041 1.7466040
# 6  1.3933982 0.7541027 1.0971724
# 7  1.8058803 2.4608417 0.7291335
# 8  1.0763517 1.2494739 1.0480818
# 9  0.7069873 1.5496575 1.2264486
# 10 0.9522526 2.1407523 1.2597422

data2.tot <- Reduce(`+`, lapply(outer, "[[", "data2"))
head(data2.tot, 10)    
#         NUM1      NUM2      NUM3
# 1  1.7568578 0.9322930 1.5579897
# 2  0.9455063 0.9211592 1.7067779
# 3  1.2698614 0.4623059 0.9426310
# 4  1.6791964 1.4304953 1.2435480
# 5  0.8088625 2.6107952 1.2308862
# 6  1.8202400 2.3511104 1.5676112
# 7  0.9765578 0.8870206 0.6725699
# 8  2.6448770 1.8931751 1.8188512
# 9  1.6114870 1.8632245 0.7452924
# 10 0.9710550 1.8367305 2.0994788

Equality Test

all.equal(data1.tot, df1 + df3 + df5)
# [1] TRUE
all.equal(data2.tot, df2 + df4 + df6)
# [1] TRUE

identical(data1.tot, df1 + df3 + df5)
# [1] TRUE
identical(data2.tot, df2 + df4 + df6)
# [1] TRUE
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I am not familiar with `Reduce`. This seems to be in the right direction, however, when I try it on my sample data, I'm getting the error `Error in f(init, x[[i]]): non-numeric argument to binary operator`. The output from lapply is a list of the different data frames. Seems it needs another command to pull out the actual data in the dfs within data1. I'm really bad at accessing things in nested lists. – ESELIA Sep 26 '18 at 02:59
  • Once again, to use plus operator, `+`, with `Reduce`, data frames have to be the same structure (same number of columns and rows) -all numeric types. See edit with reproducible example showing my answer works and returns exactly as if you added dfs together. – Parfait Sep 26 '18 at 16:05
  • Your error actually indicates list structure of dfs do not align with your post. Please show your `Reduce` attempt AND output of `dput(head(outer))`, so we can reproduce your actual data. – Parfait Sep 26 '18 at 16:05
  • I tested your solution on the sample data I provided the code for in my original question. I'm not sure why that failed as all the df's were created the same way. However, I was able to get it to work with your sample data. I thought my actual data had the same dimensions but they don't, so I'm going to fix that, try your solution on my actual data and get back to you if it doesn't work. Thanks again! – ESELIA Sep 27 '18 at 21:14
  • This solution does seem to work. The problem I'm running into now is `+` doesn't have a way to ignore NA's that I can find. Is there a way to do this and still use `Reduce`? – ESELIA Oct 01 '18 at 03:27
  • You would need to filter out NAs in data frame prior to calling `Reduce`. But do note: this can change the number of rows between dfs as NAs are sporadically placed affecting `Reduce`. Maybe [replace NAs the zero](https://stackoverflow.com/q/8161836/1422451)? – Parfait Oct 01 '18 at 14:30
0

Here is a solution that works fine if each inner list contains only a few data frames:

sum_df1 <- sum(unlist(lapply(outer, "[[", 1)))
sum_df2 <- sum(unlist(lapply(outer, "[[", 2)))

If each inner list contains e. g. 1000 data frames, use:

dfs <- seq(1 : 1000)
lapply(dfs, function(x) sum(unlist(lapply(outer, "[[", x))))

This will give you a list where each element is a sum of inner data frames.

Joe
  • 1,628
  • 3
  • 25
  • 39
  • Thanks. This appears to sum all the values in the inner data frames together, returning a single value for each set because of the `sum` function. I need the output to be a data frame that is the same dimensions as the input data frames. And values are from data frames in the same position inside different lists (i.e. group1, group2). – ESELIA Sep 26 '18 at 02:51
0

Is this what you want?

sapply(
  X = names(outer[[1]]),
  FUN = function(d) {
    Reduce(x = unlist(lapply(outer, "[[", d), recursive = F), f = "+")
  },
  simplify = F,
  USE.NAMES = T
)
Tino
  • 2,091
  • 13
  • 15