134

I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me

library(plyr)
ddply(mtcars, .(cyl,gear),nrow)

has output

  cyl gear V1
1   4    3  1
2   4    4  8
3   4    5  2
4   6    3  2
5   6    4  4
6   6    5  1
7   8    3 12
8   8    5  2

Using this code

library(dplyr)
g <- group_by(mtcars, cyl, gear)
summarise(g, length(gear))

has output

  length(cyl)
1          32

I found various functions to pass in to summarise() but none seem to work for me. One function I found is sum(G), which returned

Error in eval(expr, envir, enclos) : object 'G' not found

Tried using n(), which returned

Error in n() : This function should not be called directly

What am I doing wrong? How can I get group_by() / summarise() to work for me?

Henrik
  • 65,555
  • 14
  • 143
  • 159
charmee
  • 1,501
  • 2
  • 9
  • 9
  • I can't reproduce this. I get the same output as from `ddply`. What version of dplyr are you on? Try updating? – joran Mar 31 '14 at 17:15
  • I have the latest version 0.1.3. Do you have 0.1.2? – charmee Mar 31 '14 at 17:54
  • Nope. Your example works just fine for me with 0.1.3. – joran Mar 31 '14 at 17:56
  • 1
    What version of R do you have? Could that be causing the difference in behavior? I also tried this on a computer at home which is using Ubuntu, same thing.. – charmee Mar 31 '14 at 18:03
  • I'm on 3.0.2, but I'd be surprised if that makes a difference, unless your version of R is very, very old. – joran Mar 31 '14 at 18:04
  • I am on 3.0.2 as well. I have scoured through countless blogs and tutorials since last week before posting. If this does not work, is there another way I can count number of rows for a combination? – charmee Mar 31 '14 at 18:09
  • Start from the beginning, try it in a clean R session with no other packages loaded. BTW, your `sum(G)` example doesn't work because presumably `G` doesn't exist. Maybe you meant `g`? And your example also works fine for me using `n()`. – joran Mar 31 '14 at 18:14
  • The result you claim to be getting would only make sense if `g` was actually not grouped. So I would look for a reason why you might be somehow ungrouping your data frame. – joran Mar 31 '14 at 18:15
  • It works for me too... You can try `mtcars %.% group_by(cyl, gear) %.% summarise(length(gear))`, it gives the same result. – Victorp Mar 31 '14 at 18:23
  • 2
    Thanks!!! That solved it. If I have both plyr and dplyr packages attached, summarise does not work as expected. As soon as I restarted the session (and did not attach all normal packages by default) I was able to make it work. Phew. – charmee Mar 31 '14 at 18:23
  • n() also works once I detached plyr package. So the main culprit plyr package that interferes with dplyr package. Thanks for the help again!!! – charmee Mar 31 '14 at 18:27
  • FYI there's a warning on the package's [github](https://github.com/hadley/dplyr) page that addresses this. – joran Mar 31 '14 at 19:11
  • Thanks, you are right. Once I follow the recommended order it is working okay. – charmee Apr 01 '14 at 12:38

5 Answers5

187

There's a special function n() in dplyr to count rows (potentially within groups):

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  summarise(n = n())
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

But dplyr also offers a handy count function which does exactly the same with less typing:

count(mtcars, cyl, gear)          # or mtcars %>% count(cyl, gear)
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2
talat
  • 68,970
  • 21
  • 126
  • 157
23

I think what you are looking for is as follows.

cars_by_cylinders_gears <- mtcars %>%
  group_by(cyl, gear) %>%
  summarise(count = n())

This is using the dplyr package. This is essentially the longhand version of the count () solution provided by docendo discimus.

tb.
  • 723
  • 7
  • 16
21

Another approach is to use the double colons as this will help avoid potential conflicts with functions with similar names from other packages.

mtcars %>% 
  dplyr::group_by(cyl, gear) %>%
  dplyr::summarise(length(gear))
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
user3026255
  • 227
  • 1
  • 2
  • Edited to add why using colons would be useful although I think the main point of this answer is the use of `length` – NelsonGon Apr 13 '23 at 08:36
3

Another option, not necesarily more elegant, but does not require to refer to a specific column:

mtcars %>% 
  group_by(cyl, gear) %>%
  do(data.frame(nrow=nrow(.)))

This is equivalent to using count():

library(dplyr, warn.conflicts = FALSE)
all.equal(mtcars %>% 
            group_by(cyl, gear) %>%
            do(data.frame(n=nrow(.))) %>% 
            ungroup(),
          count(mtcars, cyl, gear), check.attributes=FALSE)
#> [1] TRUE
Matifou
  • 7,968
  • 3
  • 47
  • 52
1

Another option is using the function tally from dplyr. Here is a reproducible example:

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  tally()
#> # A tibble: 8 × 3
#> # Groups:   cyl [3]
#>     cyl  gear     n
#>   <dbl> <dbl> <int>
#> 1     4     3     1
#> 2     4     4     8
#> 3     4     5     2
#> 4     6     3     2
#> 5     6     4     4
#> 6     6     5     1
#> 7     8     3    12
#> 8     8     5     2

Created on 2022-09-11 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53