How to combine count() and group_by() to count responses with a certain value, grouped by respondent?

Question

I have a set of data where the response to a series of repeated questions is the outcome of interest. Because of this, I'd like to count the number of "I don't know" responses, grouping those counts by respondent ID, and append it as a new column. So basically, I have data that look like this:

ID	response
1	Yes
1	I don't know
2	No
2	I don't know

And I want them to look like this:

ID	response	idkcount
1	Yes	1
1	I don't know	1
2	No	1
2	I don't know	1

This is the code I've most recently written:

df$idkcount <- group_by(as_tibble(df$ID)) %>% count(df$response == "I don't know")

But I seem to get an error message no matter what I try with these two commands. What am I missing?

stefan · Accepted Answer · 2021-12-15T10:17:35.627

2

Using group_by and mutate you could do:

Note: I slightly altered your example data to a more general case.

df <- data.frame(
  ID = c(1L, 1L, 1L, 1L, 2L, 2L),
  response = c("Yes", "I don't know", "I don't know", "I don't know", "No", "I don't know")
)

library(dplyr)

df %>% 
  group_by(ID) %>% 
  mutate(idkcount = sum(response == "I don't know", na.rm = TRUE)) %>% 
  ungroup()
#> # A tibble: 6 × 3
#>      ID response     idkcount
#>   <int> <chr>           <int>
#> 1     1 Yes                 3
#> 2     1 I don't know        3
#> 3     1 I don't know        3
#> 4     1 I don't know        3
#> 5     2 No                  1
#> 6     2 I don't know        1

edited Dec 15 '21 at 10:17

answered Dec 15 '21 at 10:04

stefan

90,330
6
25
51

Thank you for this solution! It seems like it should work, but for some reason the column it creates has populated entirely with the total # of 'I don't know' responses throughout the entire dataframe. Any ideas what might need tweaking with my data? – nlplearner Dec 15 '21 at 10:13
@nlplearner. Hm. Hard to tell what's going wrong. I just made edit and checked for a more general case. As you see the code should add a column with count of IDK responses per ID. But maybe I miss something about your desired result. – stefan Dec 15 '21 at 10:19
It's vary strange, I think maybe group_by() isn't working with my ID column for some reason? Your solution should work but the grouping isn't – nlplearner Dec 15 '21 at 10:26
Just to make sure: You use `sum(response ..` not `sum(df$response ...`? – stefan Dec 15 '21 at 10:33
Yes, I copy/pasted the solution – nlplearner Dec 15 '21 at 10:36
2

One more question: Have you attached the `plyr` package besides `dplyr`? – stefan Dec 15 '21 at 10:39
You're problem is very weird but yeah I thought of that too, maybe the issue is that he didn't load the library(dplyr) – DataM Dec 15 '21 at 10:40
1

I just found a post mentioning the problem with loading plyr after dplyr: https://stackoverflow.com/questions/26923862/why-are-my-dplyr-group-by-summarize-not-working-properly-name-collision-with Your solution works perfectly now, thank you! – nlplearner Dec 15 '21 at 10:42

score 1 · Answer 2 · answered Dec 15 '21 at 10:05

1

my_df <- data.frame("id" = c(1, 1, 2, 2, 3),
                "response" = c("I don't know", "I don't know", "no", "I don't know", "maybe"),
                stringsAsFactors = FALSE)

my_df <- my_df %>% group_by(id) %>% mutate(count = length(which(response == "I don't know")))

answered Dec 15 '21 at 10:05

DataM

351
1
7

This solution gives me the following error, do you know what I need to do to fix it? Error in FUN(left, right) : comparison of these types is not implemented – nlplearner Dec 15 '21 at 10:19
I could know if you tell me what is your dataframe and if you excatly copy pasted the code above or if you made any modfications – DataM Dec 15 '21 at 10:21
I copy/paste the exact code and my dataframe dimensions is 14496 x 9 – nlplearner Dec 15 '21 at 10:29

PaulS · Answer 3 · 2021-12-15T10:20:33.250

0

A possible solution (I am using @stefan's dataset):

library(tidyverse)

df <- data.frame(
  ID = c(1L, 1L, 1L, 1L, 2L, 2L),
  response = c("Yes", "I don't know", "I don't know", "I don't know", "No", "I don't know")
)

df %>% 
  count(ID, response, name = "idkcount")

#>   ID     response idkcount
#> 1  1 I don't know        3
#> 2  1          Yes        1
#> 3  2 I don't know        1
#> 4  2           No        1

edited Dec 15 '21 at 10:20

answered Dec 15 '21 at 10:07

PaulS

21,159
2
9
26

I get this error with your solution, do you know why? Error in count(., ID, response, name = "idkcount") : unused argument (name = "idkcount") – nlplearner Dec 15 '21 at 10:34
Maybe, you have not installed `tidyverse`. – PaulS Dec 15 '21 at 10:37

How to combine count() and group_by() to count responses with a certain value, grouped by respondent?

3 Answers3