How do I summarise() correctly in R? [specific case]

Question

I have a table with respondents and their answers on the survey and companies they belong to. I want to know how many respondents answered the question by each company.

Data:

structure(list(respondent = c("a", "a", "a2", "a", "b", "b", 
"c", "c", "c", "d", "d3", "d", "d", "d2", "d", "e", "e", "e", 
"f", "g"), question = c("q1", "q2", "q1", "q2", "q1", "q2", "q1", 
"q2", "q1", "q2", "q1", "q2", "q1", "q2", "q1", "q2", "q1", "q2", 
"q1", "q2"), answer = c(1, 1, 0, 1, 1, 0, -1, 1, -1, -1, 1, 0, 
0, -1, -1, -1, 1, 1, -1, 1), name = c("AU", "AU", "GU", "AU", 
"AU", "AU", "BU", "BU", "CU", "DU", "BU", "DU", "DU", "EU", "DU", 
"EU", "EU", "EU", "FU", "GU")), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

You see, I try to use tidyverse functions, to be more precise, summarise(). I write the followin code:

a <- dat %>%
  group_by(respondent, name) %>%
  summarise(q = n())

But the output is not something I really need. I want to recieve the dataframe where there's a column with name and another column with the number of unique respondents who attributed to this name in the original dataset. I realize that is something is wrong with my summarise, but I can't find the clue.

I want to get something like that

name    N
AU      2
BU      2
CU      1
DU      1
EU      1
GU      2

Look at `dplyr::n_distinct`, such as `n_distinct(respondent)` — camille, Sep 30 '20 at 14:30
are you sure? There are 11 answers there, and I've tried out at least 4 of them that get exactly the output you're asking for, including variations on the answer you've accepted. Is there something else going on that's not in your question maybe? — camille, Sep 30 '20 at 14:45

score 1 · Accepted Answer · answered Sep 30 '20 at 14:31

1

You may try this for unique number of respondents

a <- dat %>%
     group_by( name) %>%
     summarise(q = length(unique(respondent)))

answered Sep 30 '20 at 14:31

ssaha

459
2
10

Eureka! Thank you, seems like it will help me! – rg4s Sep 30 '20 at 14:33

s_baldur · Answer 2 · 2020-09-30T14:37:08.087

1

df %>%
  group_by(name) %>% 
  summarise(n_resp = n_distinct(respondent))
#   name  n_resp
#   <chr>  <int>
# 1 AU         2
# 2 BU         2
# 3 CU         1
# 4 DU         1
# 5 EU         2
# 6 FU         1
# 7 GU         2

Or more concisely:

aggregate(respondent ~ name, df, n_distinct)

edited Sep 30 '20 at 14:37

answered Sep 30 '20 at 14:32

s_baldur

29,441
4
36
69

How do I summarise() correctly in R? [specific case]

2 Answers2