1

I have a data frame in which each ID belongs to a unique group. I wish to create a summarize table which tells me the number of observations for each id and which group it belongs to.

dat=data.frame(id=c(1,1,1,2,2,2,2,3,4,4,4,4,4),group=c(1,1,1,0,0,0,0,1,0,0,0,0,0))
count=dat%>% group_by(id)%>% tally()
## A tibble: 4 x 2
     id     n
  <dbl> <int>
1     1     3
2     2     4
3     3     1
4     4     5

with the code above I can count the number of observations. But I have no idea how to create a third column for group. The desired result is:

# A tibble: 4 x 3
     id     n group
  <dbl> <int> <dbl>
1     1     3     1
2     2     4     0
3     3     1     1
4     4     5     0

When I do

dat %>% group_by(id) %>% summarise(n=count(id), group = unique(group))

I go a error: Error in quickdf(.data[names(cols)]) : length(rows) == 1 is not TRUE However, when I do

dat %>% group_by(id) %>% summarise( group = unique(group))

It worked. I was so confused why the summarise command can not take multiple arguments. Update: the error is caused by another package called"plyr". Summarise is working well when I detached plyr.

Huang Rui
  • 67
  • 5

2 Answers2

3

We can use count

library(dplyr)
dat %>%
   count(id, group)
# A tibble: 4 x 3
#     id group     n
#  <dbl> <dbl> <int>
#1     1     1     3
#2     2     0     4
#3     3     1     1
#4     4     0     5
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you akrun. But I already know how to add a column of count. My questions is how to create a column which tells me which group each id belongs to. – Huang Rui Jul 18 '19 at 13:44
  • @HuangRui Can you show the expected output in your post – akrun Jul 18 '19 at 13:44
  • It's won't give me the desired result. Error reported. – Huang Rui Jul 18 '19 at 13:49
  • @HuangRui Can you try `dat %>% dplyr::count(id group)` – akrun Jul 18 '19 at 13:49
  • > dat %>% + count(id, group) Error in UseMethod("as.quoted") : no applicable method for 'as.quoted' applied to an object of class "function" – Huang Rui Jul 18 '19 at 13:50
  • 5
    As I mentioned, if you have loaded `plyr` along with `dplyr`, there is a `plyr;:count` which can mask. That is the reasson i suggesteed to use `dat %>% dplyr;:count(id, group)` – akrun Jul 18 '19 at 13:52
0

akrun's answer is more elegant, but as an alternative you can simply add the group variable to your group_by() call:

library(dplyr)

dat <- tibble(id = c(1, 1, 1, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4), 
              group = c(1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0))

dat %>%
  group_by(id, group) %>%
  tally()

# A tibble: 4 x 3
# Groups:   id [4]
     id group     n
  <dbl> <dbl> <int>
1     1     1     3
2     2     0     4
3     3     1     1
4     4     0     5

Notice that if your id and group are not straightfoward correspondent like in your example (id = 1 -> group = 1, id = 2 -> group = 0, and so on), it will generate a row for each combination (which obviously is very useful). For example,

dat2 <- tibble(id = c(1, 1, 1, 2, 2), group = c(1, 0, 0, 1, 0))

dat2 %>%
  group_by(id, group) %>%
  tally()

# A tibble: 4 x 3
# Groups:   id [2]
     id group     n
  <dbl> <dbl> <int>
1     1     0     2
2     1     1     1
3     2     0     1
4     2     1     1
Gabriel M. Silva
  • 642
  • 4
  • 10