I have a dataset comprised of students (id
) and the grade they where in every year:
library(data.table)
set.seed(1)
students <- data.table("id" = rep(1:10, each = 10),
"year" = rep(2000:2009, 10),
"grade" = sample(c(9:11, rep(NA, 5)), 100, replace = T))
Here is a sample for student 1:
id year grade
1: 1 2000 9
2: 1 2001 NA
3: 1 2002 NA
4: 1 2003 9
5: 1 2004 10
6: 1 2005 NA
7: 1 2006 NA
8: 1 2007 11
9: 1 2008 NA
I would like to have a way to access each students prior and future grades to preform different operations. Say for example, adding the last three grades of the student. This would result in a dataset like this one:
id year grade sum_lag_3
1: 1 2000 9 9 # 1st window, size 1: 9
2: 1 2001 NA 9
3: 1 2002 NA 9
4: 1 2003 9 18 # 2nd, size 2: 9 + 9 = 18
5: 1 2004 10 28 # 3rd, size 3: 9 + 9 + 10 = 28
6: 1 2005 NA 28
7: 1 2006 NA 28
8: 1 2007 11 30 # 4th, size 3: 9 + 10 + 11 = 30
9: 1 2008 NA 30
10: 1 2009 10 31 # 5th, size 3: 10 + 11 + 10 = 31
11: 2 2001 11 11 # 1st window, size 1: 11
(All results would look like this).
- This however is NOT a post about preforming a rolling sum.
- I want to be able to more generally preform operations within each group, to do this I would need to find a way to reference all of a students past and future grades.
So in the case of the first row, since there are no previous observations this would mean the 'past' vector is empty but the 'future' vector one would be NA NA 9 10 NA NA 11 NA 10
.
Similarly, for the second row the 'past' vector would be 9
and the 'future' vector would be:
NA 9 10 NA NA 11 NA 10
And for the third row the 'past' vector would be 9 NA
and the 'future' vector would be:
9 10 NA NA 11 NA 10
This is the information I want reference to make different calculations. Calculations that are only within each group and vary depending on the context. Preferably I would like to do this using data.table
and without reshaping my data in to a wide format.
I've tried doing the following:
students[, .SD[, sum_last_3:= ...], by = id]
but I get an error message saying this feature is not yet available on data.table
(where ... is a placeholder for any operation.).
Thank you all.