1

I have very large dataset like following:

myd <- data.frame (id = paste("id_",rep(1:500000, each = 3), sep=""),
      yvar= rep(1:500000, each= 3), xvar= rep(1:500000, each= 3))

I would like to calculate mean for ids. I am trying the following it is taking long time.

myd1 <- aggregate(myd, list(myd$id), mean)

Any quicker to do this ?

jon
  • 11,186
  • 19
  • 80
  • 132
  • Did you read the answer below? It's a bajillion times faster than yours (which has a typo) on my computer. I guess yours should be `aggregate(myd[, -1], list(myd$id), mean)` – Frank Apr 11 '17 at 14:49
  • thank you for reading the question carefully and answered it. – jon Apr 11 '17 at 14:52
  • For a comprehensive speed comparison, see Ari Friedman's answer to the "Average data by group" question linked above. – Frank Apr 11 '17 at 15:01
  • thank you, for pointing the answer. – jon Apr 11 '17 at 15:08

1 Answers1

3

With data.table:

library(data.table)
setDT(myd)
myd[,.(mean(yvar),mean(xvar)),by=(id)]

Performance comparison as follows:

system.time(myd1 <-aggregate(myd[, -1], list(myd$id), mean)) 
user  system elapsed 
19.56    0.08   19.72 

system.time(mydt1<-mydt[,.(mean(yvar),mean(xvar)),by=(id)])
user  system elapsed 
0.07    0.00    0.06
Erdem Akkas
  • 2,062
  • 10
  • 15