0

In this (made up) data frame df, I want to show the average age grouped by ID.

Output right now:

   ID age
    1  20
    1  25
   20  40
   20  35
   20  30
  356  27
 4001  22
 4001  49
 4001  46
 4001  75
55555  50
55555  54

Desired output:

   ID age  meanage
    1  20   22.5
    1  25   22.5
   20  40   35.0
   20  35   35.0
   20  30   35.0
  356  27   27.0
 4001  22   48.0
 4001  49   48.0
 4001  46   48.0
 4001  75   48.0
55555  50   52.0
55555  54   52.0

One solution that works is

for (i in min(df$ID):max(df$ID))
{
  df$meanage[df$ID == i] <- mean(df$age[df$ID == i])
}

However, this is really slow, especially for a very large data set. Is there a more efficient way to do this?

Christopher Bottoms
  • 11,218
  • 8
  • 50
  • 99
HSchmitz
  • 3
  • 2

2 Answers2

2

Here is a data.table method to find the average "age" by "ID"

library(data.table)
setDT(df)[,meanage:=mean(age)  , by = ID][]

Or a base R approach

df$meanage <- with(df, ave(age, ID))

Or using dplyr (contributed by @David Arenburg)

library(dplyr)
df %>% 
   group_by(ID) %>%
   mutate(meanage = mean(age))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You could also use plyr

library(plyr)
x <- ddply(df,.(ID),summarise,ave=mean(age))
#same as (I think):
x <- ddply(df,~ID,summarise,meanage=mean(age))

If you want the average duplicated (as in your example) you can merge the new result onto it (there may be a way to avoid that step).

sjgknight
  • 393
  • 1
  • 5
  • 19