In this (made up) data frame df, I want to show the average age grouped by ID.
Output right now:
ID age
1 20
1 25
20 40
20 35
20 30
356 27
4001 22
4001 49
4001 46
4001 75
55555 50
55555 54
Desired output:
ID age meanage
1 20 22.5
1 25 22.5
20 40 35.0
20 35 35.0
20 30 35.0
356 27 27.0
4001 22 48.0
4001 49 48.0
4001 46 48.0
4001 75 48.0
55555 50 52.0
55555 54 52.0
One solution that works is
for (i in min(df$ID):max(df$ID))
{
df$meanage[df$ID == i] <- mean(df$age[df$ID == i])
}
However, this is really slow, especially for a very large data set. Is there a more efficient way to do this?