3

I want to use the size of a group as part of a groupwise operation in dplyr::summarise.

E.g calculate the proportion of manuals by cylinder, by grouping the cars data by cyl and dividing the number of manuals by the size of the group:

mtcars %>%
  group_by(cyl) %>%
  summarise(zz = sum(am)/group_size(.))

But, (I think), because group_size is after a grouped tbl_df and . is ungrouped, this returns

Error in mutate_impl(.data, dots) : basic_string::resize

Is there a way to do this?

Scransom
  • 3,175
  • 3
  • 31
  • 51
  • We marked this as a duplicate, but it has one nuance that the title doesn't communicate, and why you can't simply use `count()`/`tally()`: OP wants the fraction of group size. So use `n()` instead of `group_size(.)` in the denominator, and same numerator as current. – smci Jul 16 '19 at 21:43

2 Answers2

5

You probably can use n() to get the number of rows for group

library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  summarise(zz = sum(am)/n())

#    cyl    zz
#  <dbl> <dbl>
#1  4.00 0.727
#2  6.00 0.429
#3  8.00 0.143
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

It is just a group by mean

mtcars %>%
    group_by(cyl) %>% 
    summarise(zz = mean(am))
# A tibble: 3 x 2
#    cyl    zz
#  <dbl> <dbl>
#1     4 0.727
#2     6 0.429
#3     8 0.143

If we need to use group_size

library(tidyverse)
mtcars %>%
   group_by(cyl) %>% 
   nest %>%
   mutate(zz = map_dbl(data, ~ sum(.x$am)/group_size(.x))) %>%
   arrange(cyl) %>%
   select(-data)
# A tibble: 3 x 2
#    cyl    zz
#  <dbl> <dbl>
#1     4 0.727
#2     6 0.429
#3     8 0.143

Or using do

mtcars %>%
    group_by(cyl) %>% 
    do(data.frame(zz = sum(.$am)/group_size(.)))
# A tibble: 3 x 2
# Groups:   cyl [3]
#    cyl    zz
#  <dbl> <dbl>
#1     4 0.727
#2     6 0.429
#3     8 0.143
akrun
  • 874,273
  • 37
  • 540
  • 662