0

I have a table with respondents and their answers on the survey and companies they belong to. I want to know how many respondents answered the question by each company.

Data:

structure(list(respondent = c("a", "a", "a2", "a", "b", "b", 
"c", "c", "c", "d", "d3", "d", "d", "d2", "d", "e", "e", "e", 
"f", "g"), question = c("q1", "q2", "q1", "q2", "q1", "q2", "q1", 
"q2", "q1", "q2", "q1", "q2", "q1", "q2", "q1", "q2", "q1", "q2", 
"q1", "q2"), answer = c(1, 1, 0, 1, 1, 0, -1, 1, -1, -1, 1, 0, 
0, -1, -1, -1, 1, 1, -1, 1), name = c("AU", "AU", "GU", "AU", 
"AU", "AU", "BU", "BU", "CU", "DU", "BU", "DU", "DU", "EU", "DU", 
"EU", "EU", "EU", "FU", "GU")), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

You see, I try to use tidyverse functions, to be more precise, summarise(). I write the followin code:

a <- dat %>%
  group_by(respondent, name) %>%
  summarise(q = n())

But the output is not something I really need. I want to recieve the dataframe where there's a column with name and another column with the number of unique respondents who attributed to this name in the original dataset. I realize that is something is wrong with my summarise, but I can't find the clue.

I want to get something like that

name    N
AU      2
BU      2
CU      1
DU      1
EU      1
GU      2
rg4s
  • 811
  • 5
  • 22
  • 2
    Look at `dplyr::n_distinct`, such as `n_distinct(respondent)` – camille Sep 30 '20 at 14:30
  • @camille I think not at all – rg4s Sep 30 '20 at 14:35
  • are you sure? There are 11 answers there, and I've tried out at least 4 of them that get exactly the output you're asking for, including variations on the answer you've accepted. Is there something else going on that's not in your question maybe? – camille Sep 30 '20 at 14:45

2 Answers2

1

You may try this for unique number of respondents

a <- dat %>%
     group_by( name) %>%
     summarise(q = length(unique(respondent)))
ssaha
  • 459
  • 2
  • 10
1
df %>%
  group_by(name) %>% 
  summarise(n_resp = n_distinct(respondent))
#   name  n_resp
#   <chr>  <int>
# 1 AU         2
# 2 BU         2
# 3 CU         1
# 4 DU         1
# 5 EU         2
# 6 FU         1
# 7 GU         2

Or more concisely:

aggregate(respondent ~ name, df, n_distinct)
s_baldur
  • 29,441
  • 4
  • 36
  • 69