0

I am trying to complete time information on a large dataset that is similar to the one down below. In the dataset I am using, there is information on changes in grades, but what I want is the grade at any given quarter. The grades are valid until a change occurs, so I want to add the grades for the quarters in between the two grade changes. Additionally, the grades are grouped by teacher. I think the best way to illustrate what I mean is by showing it in a table:

This is a simplified version of the dataset I am using:

df1 <- data.frame(Student = c("Ben","Anna","Ben","Anna","Ben"), Teacher = c("Mr. Appleby", "Mrs. Smith","Mrs. Smith","Mrs. Smith","Mr. Appleby"), Grades = c("A","B","C","A","B"), Quarter = c("2019-04-01", "2019-07-01", "2020-01-01", "2020-04-01", "2020-01-01"))

      Student     Teacher       Grades   Quarter
1     Ben         Mr. Appleby   A        2019-04-01
2     Anna        Mrs. Smith    B        2019-07-01
3     Ben         Mrs. Smith    C        2020-01-01
4     Anna        Mrs. Smith    A        2020-04-01
5     Ben         Mr. Appleby   B        2020-01-01

As a result, I want to have the student name, the teacher name, and the grade for every quarter until Q4 2020 (2020-10-01). The grades should be filled in between two quarters where the data doesn't change. This should look like this:

      Student     Teacher       Grades   Quarter
5     Ben         Mrs. Smith    C        2020-01-01
5     Ben         Mrs. Smith    C        2020-04-01
5     Ben         Mrs. Smith    C        2020-07-01
5     Ben         Mrs. Smith    C        2020-10-01
1     Ben         Mr. Appleby   A        2019-04-01
1     Ben         Mr. Appleby   A        2019-07-01
1     Ben         Mr. Appleby   A        2019-10-01
5     Ben         Mr. Appleby   B        2020-01-01
5     Ben         Mr. Appleby   B        2020-04-01
5     Ben         Mr. Appleby   B        2020-07-01
5     Ben         Mr. Appleby   B        2020-10-01
2     Anna        Mrs. Smith    B        2019-07-01
2     Anna        Mrs. Smith    B        2019-10-01
2     Anna        Mrs. Smith    B        2020-01-01
4     Anna        Mrs. Smith    A        2020-04-01
4     Anna        Mrs. Smith    A        2020-07-01
4     Anna        Mrs. Smith    A        2020-10-01

I already tried coding a for loop that parses through the data but it did not work for the two groups and it was incredibly slow and inefficient.

Do you know a way to calculate the table above?

Thank you very much for your help!

Emil
  • 3
  • 1

1 Answers1

0

You can use complete and fill :

library(tidyr)
library(dplyr)

df1 %>%
  mutate(Quarter = as.Date(Quarter)) %>%
  group_by(Teacher, Student) %>%
  complete(Quarter = seq(min(Quarter), as.Date('2020-10-01'), by = 'quarter')) %>%
  fill(Student, Grades)

#  Teacher     Student Quarter    Grades
#   <chr>       <chr>   <date>     <chr> 
# 1 Mr. Appleby Ben     2019-04-01 A     
# 2 Mr. Appleby Ben     2019-07-01 A     
# 3 Mr. Appleby Ben     2019-10-01 A     
# 4 Mr. Appleby Ben     2020-01-01 B     
# 5 Mr. Appleby Ben     2020-04-01 B     
# 6 Mr. Appleby Ben     2020-07-01 B     
# 7 Mr. Appleby Ben     2020-10-01 B     
# 8 Mrs. Smith  Anna    2019-07-01 B     
# 9 Mrs. Smith  Anna    2019-10-01 B     
#10 Mrs. Smith  Anna    2020-01-01 B     
#11 Mrs. Smith  Anna    2020-04-01 A     
#12 Mrs. Smith  Anna    2020-07-01 A     
#13 Mrs. Smith  Anna    2020-10-01 A     
#14 Mrs. Smith  Ben     2020-01-01 C     
#15 Mrs. Smith  Ben     2020-04-01 C     
#16 Mrs. Smith  Ben     2020-07-01 C     
#17 Mrs. Smith  Ben     2020-10-01 C     
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213