2

Sorry for what might be a basic / redundant question (with an awful title to boot). I have been struggling with calculating means of columns within data frames in a list. I've tried a variety of approaches mentioned in similar questions but can never get it to work. I'm relatively new to r and am a bit out of my depth.

I have a list of data frames similar to:

df1 <- data.frame(c("Jan", "Jan", "Jan"), c("21:14:33", "21:14:33", "21:14:33"), c(1, 2, 3), c(11, 12, 13))
df2 <- data.frame(c("Feb", "Feb", "Feb"), c("22:14:33", "22:14:33", "22:14:33"), c(2, 3, 4), c(12, 13, 14))
df3 <- data.frame(c("Mar", "Mar", "Mar"), c("23:14:33", "23:14:33", "23:14:33"), c(3, 4, 5), c(13, 14, 15))
mylist <- list(df1, df2, df3)

My goal is to create a vector for each data frame that contains the month, time, mean.column3, mean.column4. For example "Jan, 21:14:33, 2, 12" for the first data frame. (Ultimately I want to combine all these vectors into a new data frame, but I can do this once I have the vectors using rbind).

I have gotten the closest using for loops to calculate the mean, but using the below code it only gives me the mean for the last data frame (df3):

for(i in seq_along(mylist)){
output <- sapply(mylist[[i]][3:4], MARGIN = 2, FUN = mean)
}

I have also tried using lapply (as suggested here), abind (as suggested here), and map (as suggested here), which makes me think I'm the problem and must be missing something.

None of these approaches begins to address the need to include month and time in the resulting vector. I've tried to do it for a single data frame using code such as this, but it gives me all the months and times, when I really just need them once.

output1 <- c(mylist[[1]][1,1:2],sapply(mylist[[1]][3:4], MARGIN = 2, FUN = mean))

Help?

Cam Bullen
  • 48
  • 6

2 Answers2

3

I think your plan to calculate the means and then combine into one data frame is backwards - your data frames all have the same columns, so go ahead and combine them already! Then doing grouped means is easy.

I'll use data.table here because it has nice syntax for grouped means and it's rbindlist will ignore the different (terrible) column names in your example:

library(data.table)
mydt = rbindlist(mylist)

# get better column names
setnames(mydt, c("month", "time", "x1", "x2"))

# means by group
mydt[, .(mx1 = mean(x1), mx2 = mean(x2)), by = .(month, time)]
#    month     time mx1 mx2
# 1:   Jan 21:14:33   2  12
# 2:   Feb 22:14:33   3  13
# 3:   Mar 23:14:33   4  14

# (if you have more columns and you don't want to type out all the means)
mydt[, lapply(.SD, mean), by = .(month, time)]
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • And if you need the output as a character string with values separated by commas you can do `apply(mydt[, lapply(.SD, mean), by = .(month, time)], 1, toString)` – IceCreamToucan Oct 27 '17 at 20:23
  • This is a much better way of going about the issue. Thanks for your help @Gregor , and for explaining what the steps are doing. Much appreciated! – Cam Bullen Oct 28 '17 at 15:04
  • @RobJensen your comment here (as well as your previous answer) is helpful too because it gets around some oddities in my real data. Thanks! – Cam Bullen Oct 28 '17 at 15:06
1
with(do.call(rbind, lapply(mylist, function(x)
    setNames(x, paste0("X",1:NCOL(x))))),
    aggregate(list(C3 = X3, C4 = X4), list(C1 = X1, C2 = X2), mean))
#   C1       C2 C3 C4
#1 Jan 21:14:33  2 12
#2 Feb 22:14:33  3 13
#3 Mar 23:14:33  4 14
d.b
  • 32,245
  • 6
  • 36
  • 77