Getting the mean value in groups of an id

Question

Possible Duplicate:
faster way to create variable that aggregates a column by id

So the thing is that i have following data loaded from a CSV file:

id      value2  value3
1.000   0.010   14        
1.000   0.019   15        
0.995   0.024   13        
0.995   0.031   20        
0.990   0.012   13        
.....

I want to calculate the mean/median etc. value of value2 and value3 in groups of id. Afterwards the plan was to be able to sort the result by either value2 or value3.

Is there a way to do such task?

Thanks in advance.

The duplicate question might not be a perfect match but this is quite basic and has been asked _many_ times here before. `aggregate`, `tapply`, `split` along with `lapply`/`sapply` and/or `ddply` from the plyr package can all accomplish what you want to do. — Dason, Oct 14 '12 at 20:57

score 6 · Answer 1 · answered Oct 14 '12 at 20:41

It's data.table Sunday! This will scale well for big data -- fast and efficient.

> library(data.table)
> DT <- as.data.table(df)
> DT[, list(val2=mean(value2), val3=mean(value3)), by=id]
      id   val2 val3
1: 1.000 0.0145 14.5
2: 0.995 0.0275 16.5
3: 0.990 0.0120 13.0

score 3 · Accepted Answer · answered Oct 14 '12 at 20:38

3

library(plyr)
result <- ddply(df, .(id), function(x) { 
    data.frame(mv2 = mean(x$value2), mv3 = mean(x$value3))
    })

# order by mean value2
arrange(result, mv2)
# and for value 3
arrange(result, mv3)

answered Oct 14 '12 at 20:38

Maiasaura

32,226
27
104
108

score 0 · Answer 3 · answered Oct 14 '12 at 20:38

0

Assuming you have the data in a data frame called df you can do the following:

sapply(split(df[-1], df$id), sapply, mean)

answered Oct 14 '12 at 20:38

Backlin

14,612
2
49
81

Getting the mean value in groups of an id

3 Answers3