Aggregating data by counts of variation?

Question

I am trying to aggregate my data to account for the different number of courses a teacher has in their schedule.

Basically my data looks like this:

Id | Subject
123| algebra
123| geometry
123| algebra II
456| calc
456| calc
789| geometry
789| geometry
789| calc

and I need it to look like this:

Id | Subject count
123| 3
456| 1
789| 2

I have no idea where to start because I don't want it to simply count the number of courses they teach, I want the DIFFERENT courses. Please help!

score 0 · Accepted Answer · answered Dec 10 '19 at 23:32

We can group by 'Id' and get the distinct count of 'Subject' with n_distinct within summarise

library(dplyr)
df1 %>%
  group_by(Id) %>%
  summarise(Subject_Count = n_distinct(Subject))
# A tibble: 3 x 2
#     Id Subject_Count
#  <int>         <int>
#1   123             3
#2   456             1
#3   789             2

Or using data.table, convert to data.table (setDT(df1)), grouped by 'Id', get the distinct counts with uniqueN

library(data.table)
setDT(df1)[,.(Subject_Count = uniqueN(Subject)), by = Id]

data

df1 <- structure(list(Id = c(123L, 123L, 123L, 456L, 456L, 789L, 789L, 
789L), Subject = c("algebra", "geometry", "algebra II", "calc", 
"calc", "geometry", "geometry", "calc")), class = "data.frame",
row.names = c(NA, 
-8L))

thank you @akrun! One last question if you're still around - how do I add this new variable to my data? I tried this df1$subcount<-setDT(df1)[,.(subcount = uniqueN(subject)), by = ID] and that only replaced my data with the 2 variables — Zoe Mandel, Dec 10 '19 at 23:49
@ZoeMandel If you want to create as a new variable, use `mutate` instead of `summarise` — akrun, Dec 10 '19 at 23:49

Aggregating data by counts of variation?

1 Answers1

data