1

This question is similar to dplyr/ R cumulative sum with reset, which asked for a way to reset a cumulative summation based on a threshold value. The accepted answer to that question was a function that applies a fixed threshold for resetting the accumulation.

library(tidyverse)

sum_reset_at <- function(thresh) {
    function(x) {
        accumulate(x, ~if_else(.x >= thresh, .y, .x + .y))
    }
}

df <- tibble(a = c(2, 3, 1, 2, 2, 3))

df %>% mutate(c = sum_reset_at(5)(a))

## # A tibble: 6 x 2
##       a     c
##   <dbl> <dbl>
## 1     2     2
## 2     3     5
## 3     1     1
## 4     2     3
## 5     2     5
## 6     3     3

When the accumulation reaches (or exceeds) the threshold value it starts again from the value of a in the next record.

Rather than using a fixed threshold, I would like to provide a vector of thresholds that would be accessed sequentially, incrementing with each reset:

thresholds <- c(5, 3, 2)

df %>% mutate(c = sum_reset_at(thresholds)(a))

## # A tibble: 6 x 2
##       a     c
##   <dbl> <dbl>
## 1     2     2
## 2     3     5
## 3     1     1
## 4     2     3
## 5     2     2
## 6     3     3

The vector would be recycled as required.

I have something working using sample in the function:

set.seed(0)

sum_reset_at <- function(thresh) {
    function(x) {
        accumulate(x, ~if_else(.x >= sample(thresh, size = 1), .y, .x + .y))
    }
}

thresholds <- c(5, 3, 2)

df %>% mutate(c = sum_reset_at(thresholds)(a))

## # A tibble: 6 x 2
##       a     c
##   <dbl> <dbl>
## 1     2     2
## 2     3     3
## 3     1     4
## 4     2     2
## 5     2     4
## 6     3     3

But I don't want to randomly sample the thresholds, I want to sequentially sample them.

Alex Trueman
  • 199
  • 4
  • 12

1 Answers1

1

You can modify sum_reset_at to accept an vector for thres:

sum_reset_at <- function(thresh)
  {
    function(x) {
      i <- 1
      accumulate(x, function(.x, .y) {
        if(.x >= thresh[i])
        {
          #Increment i and return .y
          i <<- i+1
          if (i > length(thresh)) i <<- 1
          .y
        }
        else
        {
          .x + .y
        }
      })
    }
 }

df <- tibble(a = c(2, 3, 1, 2, 2, 3))

df %>% mutate(c = sum_reset_at(c(5,3,1))(a))
## A tibble: 6 x 2
#      a     c
#  <dbl> <dbl>
#1     2     2
#2     3     5
#3     1     1
#4     2     3
#5     2     5
#6     3     3
Marcelo
  • 4,234
  • 1
  • 18
  • 18
  • Thanks @Marcelo. This worked perfectly for me. I have never seen the <<- super assignment used before, very interesting. – Alex Trueman Sep 03 '18 at 17:18