1

I am trying to aggregate my data to account for the different number of courses a teacher has in their schedule.

Basically my data looks like this:

Id | Subject
123| algebra
123| geometry
123| algebra II
456| calc
456| calc
789| geometry
789| geometry
789| calc

and I need it to look like this:

Id | Subject count
123| 3
456| 1
789| 2

I have no idea where to start because I don't want it to simply count the number of courses they teach, I want the DIFFERENT courses. Please help!

Zoe Mandel
  • 13
  • 3

1 Answers1

0

We can group by 'Id' and get the distinct count of 'Subject' with n_distinct within summarise

library(dplyr)
df1 %>%
  group_by(Id) %>%
  summarise(Subject_Count = n_distinct(Subject))
# A tibble: 3 x 2
#     Id Subject_Count
#  <int>         <int>
#1   123             3
#2   456             1
#3   789             2

Or using data.table, convert to data.table (setDT(df1)), grouped by 'Id', get the distinct counts with uniqueN

library(data.table)
setDT(df1)[,.(Subject_Count = uniqueN(Subject)), by = Id]

data

df1 <- structure(list(Id = c(123L, 123L, 123L, 456L, 456L, 789L, 789L, 
789L), Subject = c("algebra", "geometry", "algebra II", "calc", 
"calc", "geometry", "geometry", "calc")), class = "data.frame",
row.names = c(NA, 
-8L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thank you @akrun! One last question if you're still around - how do I add this new variable to my data? I tried this df1$subcount<-setDT(df1)[,.(subcount = uniqueN(subject)), by = ID] and that only replaced my data with the 2 variables – Zoe Mandel Dec 10 '19 at 23:49
  • @ZoeMandel If you want to create as a new variable, use `mutate` instead of `summarise` – akrun Dec 10 '19 at 23:49