I have count data arranged by quarters that looks something like this. Let's say each count is the number of car crashes in that quarter.
df <- structure(list(quarter = c("Q4", "Q5", "Q8", "Q6", "Q3", "Q7"), column = c(6, 7, 10, 8, 5, 9), Q1 = c(8L, 7L, 4L, 10L, 9L, 6L), Q2 = c(9L, 6L, 7L, 8L, 5L, 4L), Q3 = c(15L, 17L, 14L, 20L, 19L, 16L), Q4 = c(25L, 21L, 24L, 23L, 20L, 22L), Q5 = c(20L, 22L, 16L, 18L, 17L, 21L), Q6 = c(15L, 13L, 12L, 16L, 10L, 11L), Q7 = c(9L, 7L, 11L, 6L, 5L, 8L), Q8 = c(23L, 24L, 26L, 22L, 27L, 25L)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L), class = "data.frame", .Names = c("quarter", "endcolumn", "Q1", "Q2", "Q3", "Q4", "Q5", "Q6", "Q7", "Q8"))
quarter endcolumn Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8
1 Q4 6 8 9 15 25 20 15 9 23
2 Q5 7 7 6 17 21 22 13 7 24
3 Q8 10 4 7 14 24 16 12 11 26
4 Q6 8 10 8 20 23 18 16 6 22
5 Q3 5 9 5 19 20 17 10 5 27
6 Q7 9 6 4 16 22 21 11 8 25
For each row, I want a cumulative sum of total car accidents up to a certain point in time (t0), as indicated by the quarter variable, e.g. for row 1 I want to know the total number of car crashes from Q1 to Q4. The resulting df should look something like this.
quarter endcolumn Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 t0
1 Q4 6 8 9 15 25 20 15 9 23 57
2 Q5 7 7 6 17 21 22 13 7 24 73
3 Q8 10 4 7 14 24 16 12 11 26 114
4 Q6 8 10 8 20 23 18 16 6 22 95
5 Q3 5 9 5 19 20 17 10 5 27 33
6 Q7 9 6 4 16 22 21 11 8 25 88
Note that the range for the sums varies over the rows. To make the above table, I translated the 'quarter' values into the variable 'endcolumn' (e.g. Q4 = column 6), and then used these values as below:
df$t0 <- NA
df[1, 11] <- rowSums(df[1, 3:6])
df[2, 11] <- rowSums(df[2, 3:7])
df[3, 11] <- rowSums(df[3, 3:10])
df[4, 11] <- rowSums(df[4, 3:8])
df[5, 11] <- rowSums(df[5, 3:5])
df[6, 11] <- rowSums(df[6, 3:9])
As my actual dataset is much larger than the one here, ideally I'd write a function that takes the values from endcolumn and uses them as the endpoint for rowsum.
There's a lot of help on passing variables names to functions (e.g. here and here), but something like this rowSums(df[ , 3:which(names(df) == "endcolumn")])
just adds the endcolumn values to the values in Q1.
This is the closest I can find to my question. However, I've figured out that if I try to do this, it still doesn't allow me to vary the range of columns that I want to sum by row (i.e. 3:6 for the first row, 3:7 for the second).
Is there a way to pass the values from endcolumn to a function using rowSums?
NB: I realise a second option might be to use the values in the quarter variable (e.g. Q4, Q5) and use them to identify the endpoint column. The reason I haven't outlined my question in this way is because once I write this function for t0, I also want to be able to find sums for a year before and year after t0, and I think it will be easier to tweak a function by using some variation of "3: endcolumn +/- 4" rather than trying to change Q9 to Q5 or Q13. But if there's an easy way to do the latter, that would also be helpful.