2

I am looking to create a new group based on two conditions. I want all of the cases until the cumulative sum of Value reaches 10 to be grouped together and I want this done within each person. I have managed to get it to work for each of the conditions separately, but not together using for loops and dplyr. However, I need both of these conditions to be applied. Below is what I would like the data to look like (I don't need an RunningSum_Value column, but I kept it in for clarification). Ideally I would like a dplyr solution, but I m not picky. Thank you in advance!

ID     Value RunningSum_Value   Group
PersonA    1    1                 1
PersonA    3    4                 1
PersonA    10   14                1
PersonA    3    3                 2
PersonB    11   11                3
PersonB    12   12                4
PersonC    3    3                 5
PersonD    4    4                 6
PersonD    9    13                6
PersonD    5    5                 7
PersonD    11   16                7
PersonD    6    6                 8
PersonD    1    7                 8

Here is my data:

df <- read.table(text="ID   Value
                 PersonA    1
                 PersonA    3
                 PersonA    10
                 PersonA    3
                 PersonB    11
                 PersonB    12
                 PersonC    3
                 PersonD    4
                 PersonD    9
                 PersonD    5
                 PersonD    11
                 PersonD    6
                 PersonD    1", header=TRUE,stringsAsFactors=FALSE)
Fargonian
  • 51
  • 5
  • Your question is very similar to one I [asked a couple of days ago](https://stackoverflow.com/questions/45549992/group-vector-on-conditional-sum). Have a look, maybe It will help – Sotos Aug 10 '17 at 19:30
  • Its similar, but your question only meets one of my conditions, I also need it to restart afresh with each person. – Fargonian Aug 10 '17 at 20:36
  • And what did you try? – S Rivero Aug 10 '17 at 21:58

1 Answers1

2

Define function sum0 which does a sum on its argument except that each time it gets to 10 or more it outputs 0. Define function is_start that returns TRUE for the start position of a group and FALSE otherwise. Finally apply is_start to each ID group using ave and then perform a cumsum on that to get the group numbers.

sum0 <- function(x, y) { if (x + y >= 10) 0 else x + y }
is_start <- function(x) head(c(TRUE, Reduce(sum0, init=0, x, acc = TRUE)[-1] == 0), -1)
cumsum(ave(DF$Value, DF$ID, FUN = is_start))
## [1] 1 1 1 2 3 4 5 6 6 7 7 8 8

UPDATE: fix

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341