How to do group by count in R

Question

I want to have count of booking IDs on Month-Source Level

Month   Source  Booking_id
Oct        A    100
Nov        B    101
Oct        A    106
Jan        B    109
Nov        A    110
Nov        B    111


data <- structure(list(Month = c("October", "November", "October", "January", 
"November", "November"), Source = c("A", "B", "A", "B", "A", 
"B"), Booking_ID = c(100L, 101L, 106L, 109L, 110L, 111L)), .Names = c("Month", 
"Source", "Booking_ID"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L))

score 2 · Accepted Answer · answered Oct 29 '15 at 10:04

2

Maybe This could help:

table(data$Month, data$Booking_id)

#     100 101 106 109 110 111
# Jan   0   0   0   1   0   0
# Nov   0   1   0   0   1   1
# Oct   1   0   1   0   0   0


table(data$Month, data$Source)

#     A B
# Jan 0 1
# Nov 1 2
# Oct 2 0

answered Oct 29 '15 at 10:04

Not Working.. I am just getting a row of Months – Akshit Oct 29 '15 at 10:15
@Akshit Can you `dput` your data? – Oct 29 '15 at 10:40
It worked Thanks....I had some Null values earlier – Akshit Oct 29 '15 at 11:02

mpalanco · Answer 2 · 2015-10-30T07:08:13.107

Two alternatives:

1. aggregate

aggregate(Booking_ID ~ Month + Source, data, FUN = "length")

Output:

     Month Source Booking_ID
1 November      A          1
2  October      A          2
3  January      B          1
4 November      B          2

2. sqldf

library(sqldf)
sqldf("SELECT  Month, Source, COUNT(*) AS Count FROM data GROUP BY Month, Source")

Output:

     Month Source Count
1  January      B     1
2 November      A     1
3 November      B     2
4  October      A     2

akrun · Answer 3 · 2015-10-29T17:36:10.857

0

We can use dplyr. We group by 'Month', 'Source' and get the n_distinct of 'Booking_id' i.e. number of unique elements of 'Booking_id' or if we need the total number use n().

library(dplyr)
data %>%
  group_by(Month, Source) %>%
  summarise(n= n_distinct(Booking_ID))
  #if we wanted the total count instead of unique
  #summarise(n=n()) 

#    Month Source     n
#     (chr)  (chr) (int)
#1  January      B     1
#2 November      A     1
#3 November      B     2
#4  October      A     2

edited Oct 29 '15 at 17:36

answered Oct 29 '15 at 09:56

akrun

874,273
37
540
662

1

The `Source` level is missing, isn't it? – Oct 29 '15 at 09:58
Not Working.. Error: unsupported type for column 'booking_com$booking_month' (NILSXP, classes = NULL) – Akshit Oct 29 '15 at 10:10
@Akshit Please show a dput output of the example dataset. It works for me. – akrun Oct 29 '15 at 10:14
> table(booking_com$booking_month, booking_com$source_meaning) < table of extent 7 x 0 > – Akshit Oct 29 '15 at 10:18
1

@Akshit As I mentioned earlier, use `dput(df1)` and show the output in your post. – akrun Oct 29 '15 at 10:20

How to do group by count in R

3 Answers3