3

I am trying to extract a series of values in a vector that meet a certain condition. To illustrate this imagine I have the following vector:

a <- c(1,2,1,3,12,3,2,15,1,1,1,1,4,5,20)

I would like to isolate consecutive values that whose sum is less than 10 so that the output looks like this:

[1] 1 2 1 3
[1] 3 2
[1] 1 1 1 1 4
[1] 5

I can solve this very inefficiently by using zoo::rollsum() and a logical test

which(rollsum(a,2) < 10)

but in order to do so I have to run it several times, each time increasing the rolling window. Again I can do this in a loop but it's clearly not the best way.

Can anyone think of a solution for this? Any help would be much appreciated!

DHenry
  • 83
  • 5
  • 1
    @akrun, that is not the correct output – talat Oct 11 '17 at 08:32
  • hmm... I have asked a similar question [here](https://stackoverflow.com/questions/45549992/group-vector-on-conditional-sum). If we use the accepted answer and load the `sotosGroup` function (based on `Rcpp` package), then `split(a, sotosGroup(a, 10))` gets you very very close to what you want. More complete, `Filter(length, lapply(split(a, sotosGroup(a, 10)), function(i) i[i <= 10]))` – Sotos Oct 11 '17 at 08:37
  • Why is the 4 repeated? Shouldn't 1+1+1+1+4 = 8 < 10 be one series, with the next series consisting of 5 < 10 by itself? – Maurits Evers Oct 11 '17 at 08:55

2 Answers2

1

I would use my own loop. The result is the same as Maurits':

a <- c(1,2,1,3,12,3,2,15,1,1,1,1,4,5,20)

my.roll <- function(x, limit) {
  res <- vector("list", length(x))
  ctr <- 1
  for (i in seq_along(x)) {
    res[[ctr]] <- c(res[[ctr]], x[i])
    if (sum(res[[ctr]], x[i+1], na.rm = TRUE) > limit) {ctr = ctr+1} else {ctr}
  }
  res <- res[!sapply(res, is.null) & sapply(res, function(x) sum(x) <= limit)]
  return(res)
}
my.roll(a, 10)
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0

What about the following using %/% on the cumulative sum:

idx <- as.numeric(factor(cumsum(a) %/% 10))
ret <- split(a, idx)
ret <- ret[sapply(ret, function(x) all(x < 10))]

ret;
#$`1`
#[1] 1 2 1 3
#
#$`3`
#[1] 3 2
#
#$`5`
#[1] 1 1 1 1 4
#
#$`6`
#[1] 5

Explanation: as.numeric(factor(...)) returns the indices for split; in the last step I remove the entries >=10.

Note that this assumes that there is a mistake in the OP's example, where the number 4 seems to be repeated. If OP's example is actually correct then I don't understand the problem.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Then I definitely don't understand OP's problem/example. Why is only the number 4 repeated and part of two series? According to OP: "I would like to isolate consecutive values that whose sum is less than 10". That sounds like a cumulative sum to me. Either way, I will remove this answer if OP clarifies. A pity though, it was exciting finding a use for integer division... – Maurits Evers Oct 11 '17 at 09:24
  • The reason that only 4 is repeated is because, If any other value repeats, then the sum will go over 10. So for group 1, the next value is 12, for group 2 is 15... However, this is just a guess. Not sure If I m correct – Sotos Oct 11 '17 at 09:30
  • Thanks for the clarification @Sotos. Not sure I understand though. Hopefully OP can clarify. – Maurits Evers Oct 11 '17 at 09:36
  • Apologies, that was a mistake. The 4 should not be repeated (edits made in question). In that case @MauritsEvers solution does work for this example. However I applied this code to another vector `a <- c(120,26,281,42,57,31,163,232)` where I'm looking for series of numbers < 365. It does not give the desired output (the second group it generates is: `281, 42, 57, 31, 163`)... – DHenry Oct 11 '17 at 10:11
  • @DHenry. Yes you are right. I've not got a fix at the moment. But will keep thinking about it. Perhaps somebody will come up with a solution. It's an interesting challenge... – Maurits Evers Oct 11 '17 at 10:56
  • By the way. The solution of @r.user.05apr gives the correct result for your second example. – Maurits Evers Oct 11 '17 at 11:22