0

I have dataset with a parentID variable and a childIQ variable which represents the IQ of the children of that specific parent:

df <- data.frame("parentID"=c(101,101,101,204,204,465,465),
  "childIQ"=c(98,90,81,96,87,71,65))

parentID, childIQ
101, 98
101, 90
101, 81
204, 96
204, 87
465, 71
465, 65

I ran an aggregate() function so there is only 1 row per parent, and the childIQ value becomes the mean IQ of that parent's children:

df_agg <- aggregate(childIQ ~ parentID , data = df, mean)

parentID, avg_childIQ
101, 89.67
204, 91.5
465, 68

However, I want to add another column that represents the number of children for that parent, like this:

parentID, avg_childIQ, num_children
101, 90.67, 3
204, 91.5, 2
465, 68, 2

I'm not sure how to do this using data.table once I have already created df_agg?

codemachino
  • 33
  • 1
  • 7

2 Answers2

3

It is possible to supply several functions to aggregate by using a function(x) c(...) code.

df_agg <- aggregate(childIQ ~ parentID , data = df,
                    function(x) c(mean = mean(x), 
                                  n = length(x)))

#>   parentID childIQ.mean childIQ.n
#> 1      101     89.66667   3.00000
#> 2      204     91.50000   2.00000
#> 3      465     68.00000   2.00000
Cedric
  • 71
  • 1
  • 5
2

Using dplyr:

library(dplyr)
df %>% group_by(parentID) %>% summarise(avg_childID = mean(childIQ), num_children = n())
# A tibble: 3 x 3
  parentID avg_childID num_children
     <dbl>       <dbl>        <int>
1      101        89.7            3
2      204        91.5            2
3      465        68              2

Using data.table:

library(data.table)
setDT(df)[,list(avg_childID = mean(childIQ), num_children = .N), by=parentID]
   parentID avg_childID num_children
1:      101    89.66667            3
2:      204    91.50000            2
3:      465    68.00000            2
Karthik S
  • 11,348
  • 2
  • 11
  • 25