13

When I first started programming in R I would often use dplyr count().

library(tidyverse)    
mtcars %>% count(cyl)

Once I started using apply functions I started running into issues with count(). If I simply added ungroup() to the end of my count()'s the problems would go away.

I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count(), or after any group_by()? Of course I'm assuming I no longer need the data grouped after it's counted or summarized.

mtcars %>% count(cyl) %>% ungroup()
stackinator
  • 5,429
  • 8
  • 43
  • 84

1 Answers1

17

The issues you used to run into were from an old behavior of count(). Up to dplyr 0.5.0, if you did:

mtcars %>%
  count(cyl, wt)

The result would still be grouped by the cyl column. This means, for example, that if you followed it with something like summarize(mean(am)), you would have gotten one row for each cyl when you may have expected one row overall. The issue would be fixed if you put %>% ungroup() after the count.

This behavior was changed in dplyr 0.7.0 (released in June 2017), such that count() preserves the grouping of its input (meaning mtcars %>% count(wt, cyl) now returns an ungrouped table). This is likely why you're no longer able to reproduce the problems, and it means you no longer need to do ungroup() after a count().


Note that you may still need to do ungroup() after a group_by() and summarize():

mtcars %>%
  group_by(cyl, wt) %>%
  summarize(n = n())

returns a tibble still grouped by cyl:

# A tibble: 30 x 3
# Groups:   cyl [?]
     cyl    wt     n
   <dbl> <dbl> <int>
 1     4  1.51     1
 2     4  1.62     1
 3     4  1.84     1
 4     4  1.94     1
 5     4  2.14     1
 6     4  2.2      1
 7     4  2.32     1
 8     4  2.46     1
 9     4  2.78     1
10     4  3.15     1
# ... with 20 more rows
David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • `group_by(cyl, wt) ` returns a tibble grouped by `cyl` _**and**_ `wt`. – skoh Jan 02 '19 at 15:44
  • 1
    @skoh Each call of `summarize` drops one grouping level. Please see the first example here: https://dplyr.tidyverse.org/reference/summarise.html#examples – ba_ul Sep 07 '19 at 22:23