Let's start with a functional interface, which is a list of functions, along with their inputs and outputs, that will solve the problem.
The function reset_cum_sums
takes two elements, a vector and a list of reset positions of that vector. The output will be a vector containing the cumulative sums, with the sums restarting at each desired position of the vector. An example should make this clearer:
At each reset position, the cumulative sum resets. So, if the inputs are 1:10
and the position vector is 3 5 7
, the output would be
input: [1 2 3 4 5 6 7 8 9 10]
output: [1 3 3 7 5 11 7 15 24 34]
If no positions are given, this will produce the same result as cumsum
.
is_feb_1st
will return TRUE
if a date is February First, FALSE
otherwise. I will leave this as an exercise for you.
The functional interface uses primitive functions which
,split
, and lapply
whose documentation is left as an exercise to read.
Now, the outline of the solution can be written as:
restart_feb_first<-function(data.frame) {
reset_cum_sums(data.frame$value,
which(is_feb_first(data.frame$date))
}
If the February firsts occur in your data at positions 32,32+365,32+730,.. that would make up your position vector. The nice thing is you can easily accommodate leap years.
The only challenging part is to write reset_cum_sums
; Here I provide one way to do it, not necessarily the most efficient. The program splits the vector up into chunks, each one starting at the proper position (in your case, the February firsts). Note that the pipe operator is not required for this example. You could use traditional functional notation instead.
Also, I wrote the function this way to illustrate some R concepts, not necessarily to write the highest performing code. But, if you want to rewrite, you merely isolate your efforts on this function.
#
# purpose: define a function that creates cumulative sums
# of vectors, but which reset at each position given by
# the vector `positions`, which can be null.
# reset_sum
# parameters for hypothetical example
set.seed(18)
values=runif(50)
# cumulative sums reset at these positions.
positions=c(3,13,23,33,43)
# dependencies
require(magrittr) # or tidyverse for pipe operator
reset_sum = function(vector,positions) {
k=length(vector)
# cut the list into pieces
splitter=cut(1:k,breaks=c(-Inf,positions,Inf),right = FALSE)
pieces=split(vector,splitter)
# do the cumsum of each piece, and then glue then back together
pieces %>% lapply(cumsum) %>% unlist(use.names=FALSE)
}
Here's how the function would be called
# examples
reset_sum(values,positions)
reset_sum(rep(1,50),positions)
I hope this guides you ta solution that fits your needs. The key concept is to break it down until you find a function that is 'easy' to write in terms of R primitives. If you need reset_cum_sums
to be super efficient, it should be fairly easy to write in C, or data.table
, but let's leave that for another day.
Update
This function returns a vector, so to use it with the data table package, just add an assign, as in
DT[,new_column:=reset_sum(value,,isFebFirst(date)]