0

I have a data.frame that looks somewhat like this.

k <- data.frame(id = c(1,2,2,1,2,1,2,2,1,2), act = c('a','b','d','c','d','c','a','b','a','b'), var1 = 25:34, var2= 74:83)

I have to group the data into separate levels by first 2 columns and write the mean of the the next 2 columns(var1 and var2). It should look like this

     id act varmean1 varmean2

  1   1   a         
  2   1   c        
  3   2   a        
  4   2   b        
  5   2   b        
  6   2   d       

The values of respective means are filled in varmean1 and varmean2.

My actual dataframe has 88 columns where I have to group the data into separate levels by the first 2 columns and find the respective means of the remaining. Please help me figure this out as soon as possible. Please try to use 'dplyr' package for the solution if possible. Thanks.

itsAmK
  • 11
  • 2

1 Answers1

2

You have several options:

base R:

aggregate(. ~ id + act, k, mean)

or

aggregate(cbind(var1, var2) ~ id + act, k, mean)

The first option aggregates all the column by id and act, the second option only the column you specify. In this case both give the same result, but it is good to know for when you have more columns and only want to aggregate some of them.

dplyr:

library(dplyr)
k %>%
  group_by(id, act) %>% 
  summarise_each(funs(mean))

If you want to specify the columns for which to calculate the mean, you can use summarise instead of summarise_each:

k %>%
  group_by(id, act) %>% 
  summarise(var1mean = mean(var1), var2mean = mean(var2))

data.table:

library(data.table)
setDT(k)[, lapply(.SD, mean), by = .(id, act)]

If you want to specify the columns for which to calculate the mean, you can add .SDcols like:

setDT(k)[, lapply(.SD, mean), by = .(id, act), .SDcols=c("var1", "var2")]
Jaap
  • 81,064
  • 34
  • 182
  • 193