Average over hundreds of columns with summarize?

Question

I have a dataset that has a range of IDs and activities, and a bunch of columns of observations for each combination of ID and activity. I'd like to take the average of each observation, but since there's hundreds and hundreds of observations, I'm unclear how to proceed.

Example data:

id,activity,obs1,obs2,obs3
1,1,325,6432,5432
1,2,321,214,2143
1,3,3652,123,123
2,1,5321,123,643
2,2,4312,4321,432
2,3,522,123,321
1,1,532,765,8976
1,2,142,865,5445
1,3,643,654,53
2,1,756,765,7865
2,2,876,654,976
2,3,6754,765,987

What I've tried so far:

library(dplyr)
example <- read.table("clipboard",sep=",",header=T)
group <- group_by(example,id,activity)
summarize(group, mobs1=mean(obs1), mobs2=mean(obs2), mobs3=mean(obs3))

Which gets me the right form, but how can I go about the summarize() without typing mobsN=mean(obsN) hundreds of times? I feel like an apply function will go in here but I'm not sure which...

Jaap · Accepted Answer · 2016-03-31T10:35:35.497

3

This should give you the desired result:

library(dplyr)
means.wide <- example %>% 
  group_by(id,activity) %>% 
  summarise_each(funs(mean))

You could also convert example to long format and then calculate the means:

library(dplyr)
library(tidyr)

means.long <- example %>% 
  gather(obs, val, -c(id,activity)) %>% 
  group_by(id,activity,obs) %>% 
  summarise(mean_val=mean(val))

You could also do this with the data.table package:

# compareble to the wide dplyr version
library(data.table)
setDT(example)[, lapply(.SD, mean), by=list(id,activity)]

# compareble to the long dplyr version
library(data.table)
melt(setDT(example),id.vars=c("id","activity"))[, mean(value), by=list(id,activity,variable)]

And don't forget about good old base R:

aggregate(. ~ id + activity, example, FUN = mean)

edited Mar 31 '16 at 10:35

answered Jul 22 '15 at 19:30

Jaap

81,064
34
182
193

If you've used `gather`, you should just be left with three columns (id, obs and val) so can't you just use `summarise(mean_val = mean(val))`? Or you could use `summarise_each` without using `gather` first. – Nick Kennedy Jul 22 '15 at 19:33
@NickKennedy you're correct, I made a mistake; see the updated answer – Jaap Jul 22 '15 at 19:40

Average over hundreds of columns with summarize?

1 Answers1