0

Looking to generate a new column with a count of each 'class' occurrence per year and per term.

The dput for the df is as follows:

structure(list(class = c("ART", "ART", "ART", "ART", "ART", "PE", 
"PE", "PE", "PE", "PE", "Science", "Science", "Science", "Science", 
"Science", "Music", "Music", "Music", "Music", "Music", "Music", 
"Music", "Music", "Music", "Music", "Music", "Music", "Music", 
"PE", "PE", "PE", "PE", "PE", "PE", "PE", "PE", "PE", "ART", 
"ART", "ART"), year = c("2019", "2020", "2019", "2020", "2019", 
"2020", "2019", "2020", "2019", "2020", "2019", "2020", "2019", 
"2020", "2019", "2020", "2019", "2020", "2019", "2020", "2019", 
"2020", "2019", "2020", "2019", "2020", "2019", "2020", "2019", 
"2020", "2019", "2020", "2019", "2020", "2019", "2020", "2019", 
"2020", "2019", "2020"), term = c("Semester 1", "Semester 1", 
"Semester 1", "Semester 1", "Semester 1", "Semester 1", "Semester 1", 
"Semester 1", "Semester 1", "Semester 1", "Semester 1", "Semester 1", 
"Semester 1", "Semester 1", "Semester 1", "Semester 1", "Semester 1", 
"Semester 1", "Semester 1", "Semester 1", "Semester 2", "Semester 2", 
"Semester 2", "Semester 2", "Semester 2", "Semester 2", "Semester 2", 
"Semester 2", "Semester 2", "Semester 2", "Semester 2", "Semester 2", 
"Semester 2", "Semester 2", "Semester 2", "Semester 2", "Semester 2", 
"Semester 2", "Semester 2", "Semester 2")), class = "data.frame", row.names = c(NA, 
-40L))

Happy to hard code each year/term as per below, but can't hardcode each class as in my real dataset there are 1000+ classes. My attempt was as follows:

df$enrols[df$year==2020 & df$term=="Semester 1"] = length(unique(df$class))

Asked this question yesterday and received some helpful answers but don't produce the required output. The question was subsequentlyt closed hence the new question.

What I'm after is a count for how many times each class appears for a certain term and year. For example, in the df provided, Observations with 'Art' in 'Semester 1' and '2019' should have a column ('enrols') with '4' as there are 4 occurences for 'Art' in that year and term.

nickot
  • 37
  • 4
  • 'Art' has 3 rows for Sem 1 2019, not 4. can you check your example again? – Pierre L Oct 25 '20 at 03:36
  • Hi Pierre, yep, you are correct, art with 3. Thanks for the code below. Simple solution and prompted me to a dplyr manual which has greatly helped. Thank you – nickot Oct 26 '20 at 10:29

2 Answers2

0

With dplyr, you can group by the year, term, and class then find the count for enroll.

library(dplyr)
df %>% group_by(year, term, class) %>% mutate(enroll=n())
#   class year  term       enroll
#   <chr> <chr> <chr>       <int>
# 1 ART   2019  Semester 1      3
# 2 ART   2020  Semester 1      2
# 3 ART   2019  Semester 1      3
# 4 ART   2020  Semester 1      2
# 5 ART   2019  Semester 1      3
# 6 PE    2020  Semester 1      3
Pierre L
  • 28,203
  • 6
  • 47
  • 69
0

You can use count function to directly count by group sas below

library(dplyr)
df %>% count(year, term, class, name="enroll") 
Vaibhav Singh
  • 1,159
  • 1
  • 10
  • 25