merge or mutate a summary (dplyr)

Question

I am always unsure how to retrieve a summary with dplyr.

Let us suppose I have a summary of individuals and households.

dta = rbind(c(1, 1, 45), 
  c(1, 2, 47), 
  c(2, 1, 24),
  c(2, 2, 26), 
  c(3, 1, 67), 
  c(4, 1, 20),
  c(4, 2, 21),
  c(5, 3, 7)
 ) 
dta = as.data.frame(dta)
colnames(dta) = c('householdid', 'id', 'age')

 householdid id age
           1  1  45
           1  2  47
           2  1  24
           2  2  26
           3  1  67
           4  1  20
           4  2  21
           4  3   7

Imagine I want to calculate the number of person in the household and the mean age by households and then re-use this information in the original dataset.

dta %>% 
  group_by(householdid) %>% 
  summarise( nhouse = n(), meanAgeHouse = mean(age) ) %>% 
  merge(., dta, all = T)

I am often using merge, but it is slow sometimes when the dataset is huge.
Is it possible to

mutate

instead of

merge ?

Yes, just do `dta %>% group_by(householdid) %>% mutate( nhouse = n(), meanAgeHouse = mean(age) )` — David Arenburg, Jun 08 '15 at 14:59
I would also suggest looking into _data.table_ package. These things are pretty straight forwad and very fast in data.table. It has the concept of recycling values which will be helpful here. — nehiljain, Jun 08 '15 at 15:02
The solution provide by @DavidArenburg is excellent, if you want to keep results just use that code with an assignment `data <- code by David`. It seems reasonable, dplyr is smart and fast enough to do not reallocate memory but just point to the old object added with the new elements. — SabDeM, Jun 08 '15 at 15:51
@DavidArenburg thank you very much ! Put it as an answer please — giac, Jun 08 '15 at 16:17

score 0 · Answer 1 · edited Dec 13 '17 at 06:50

0

dta %>% group_by(householdid) %>% mutate( nhouse = n(), meanAgeHouse = mean(age) )

edited Dec 13 '17 at 06:50

PKumar

10,971
6
37
52

answered Oct 17 '17 at 15:25

3pitt

899
13
21

merge or mutate a summary (dplyr)

1 Answers1