I want to group a vector based on the sum of the elements being less than or equal to n
. Assume the following,
set.seed(1)
x <- sample(10, 20, replace = TRUE)
#[1] 3 4 6 10 3 9 10 7 7 1 3 2 7 4 8 5 8 10 4 8
#Where,
n = 15
The expected output would be to group values while their sum is <= 15, i.e.
y <- c(1, 1, 1, 2, 2, 3, 4, 5 ,5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10)
As you can see the sum is never greater than 15,
sapply(split(x, y), sum)
# 1 2 3 4 5 6 7 8 9 10
#13 13 9 10 15 12 12 13 14 8
NOTE: I will be running this on huge datasets (usually > 150 - 200GB) so efficiency is a must.
A method that I tried and comes close but fails is,
as.integer(cut(cumsum(x), breaks = seq(0, max(cumsum(x)) + 15, 15)))
#[1] 1 1 1 2 2 3 3 4 4 4 5 5 5 6 6 6 7 8 8 8