1

Hi I have the dataset below:

ID <- c(1,1,1,2,2,3,3,3,4,4,4)
diagnosis <- c("A","A","B","C","C","B","A","A","C","C","B")
df <- data.frame(ID,diagnosis)

ID diagnosis
1  A
1  A
1  B 
2  C
2  C
3  B
3  A
3  A
4  C 
4  C
4  B

I would like to count how many people had each type of diagnosis. Some people have the same diagnosis multiple times which I would like to have them count once.

ie. Only two people would have diagnosis "A". (ID 1 and ID 3)

ie. Only two people would have diagnosis "C". (ID 2 and ID 4)

ie. Only three people would have diagnosis "B". (ID 1, ID 2 and ID 4)

I'm wondering if there's a way of summarizing the above into a table.

I would appreciate all the help there is! Thanks!!!

Bruh
  • 277
  • 1
  • 6

3 Answers3

3

You could group_by on diagnosis and summarise with n_distinct to count the ID's per group like this:

library(dplyr)
df %>%
  group_by(diagnosis) %>%
  summarise(n = n_distinct(ID))
#> # A tibble: 3 × 2
#>   diagnosis     n
#>   <chr>     <int>
#> 1 A             2
#> 2 B             3
#> 3 C             2

Created on 2023-03-31 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
3
cols <- c("ID", "diagnosis")

table(unique(df[cols])$diagnosis)

# A B C 
# 2 3 2 
s_baldur
  • 29,441
  • 4
  • 36
  • 69
2

Try table + colSums

> colSums(table(df) > 0)
A B C
2 3 2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81