1

I want a variable such as desired_output, based on a cumulative sum over cumsumover, where the cumsum function resets every time it reaches the next number in thresh.

cumsumover <- c(1, 2, 7, 4, 2, 5)
thresh <- c(3, 7, 11)
desired_output <- c(3, 3 ,7 ,11 ,11 ,11) # same length as cumsumover

This question is similar, but I can't wrap my head around the code. dplyr / R cumulative sum with reset

Compared to similar questions my condition is specified in a vector of different length than the cumsumover.

Any help would be greatly appreciated. Bonus if both a base R and a tidyverse approach is provided.

dzgreen
  • 13
  • 4

3 Answers3

2

In base R, we can use cut with breaks as thresh and labels as letters of same length as thresh.

cut(cumsum(cumsumover),breaks = c(0, thresh[-1], max(cumsum(cumsumover))),
          labels = letters[seq_along(thresh)])

#[1] a a b c c c

Replaced the last element of thresh with max(cumsum(cumsumover)) so that anything outside last element of thresh is assigned the last label.


If we want labels as thresh instead of letters

cut(cumsum(cumsumover),breaks = c(0, thresh[-1], max(cumsum(cumsumover))),labels = thresh)
#[1] 3  3  7  11 11 11
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Here is another solution:

data:

cumsumover <- c(1, 2, 7, 4, 2, 5)
thresh     <- c(3, 7, 11)

code:

outp <- letters[1:3] # to make solution more general
cumsumover_copy <- cumsumover  # I use <<- inside sapply so therefore I make a copy to stay save

unlist(
sapply(seq_along(thresh), function(x) {
    cs_over <- cumsum(cumsumover_copy)
    ntimes = sum( cs_over <= thresh[x] )
    cumsumover_copy <<- cumsumover_copy[-(1:ntimes)]
    return( rep(outp[x], ntimes) )
                             } )
)

result:

#[1] "a" "a" "b" "c" "c" "c"
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
0

Using .bincode you can do this:

thresh[.bincode(cumsum(cumsumover), c(-Inf,thresh[-1],Inf))]
[1]  3  3  7 11 11 11

.bincode is used by cut, which basically adds labels and checks, so it's more efficient:

 x <-rep(cumsum(cumsumover),10000)
microbenchmark::microbenchmark(
  bincode   = thresh[.bincode(x, c(-Inf,thresh[-1],Inf))],
  cut       = cut(x,breaks = c(-Inf, thresh[-1], Inf),labels = thresh))
# Unit: microseconds
#     expr    min      lq     mean  median      uq     max neval
#  bincode  450.2  459.75  654.794  482.10  642.20  5028.4   100
#      cut 1739.3 1864.90 2622.593 2215.15 2713.25 12194.8   100
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167