Repeated sequential index based on events in grouped data

Question

I have a R dataframe like this:

ID      Event Out  
A         0    0
A         1    1
A         1    1
A         0    0
A         1    2
B         1    3
B         0    0
C         1    4
C         1    4
C         1    4

I am trying to create the out field which is a sequential conditional (on event =1 or not) repeated index. The index needs to increment by 1 with every new group occurrence of the event but carrying on in the sequence from the previous group. Is there a plyr option for this. Thanks in advance.

Welcome to SO! Please add the output table as well. Also, check this out to learn correct way of posting questions https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — YOLO, Mar 15 '18 at 10:14
should be:df<-data.frame(ID=c('A','A','A','A','A','B','B','C','C','C'),Event=c(0,1,1,0,1,1,0,1,1,1),Out = c(0,1,1,0,2,3,0,4,4,4)) — Al Mac, Mar 15 '18 at 10:15
what do you mean with "group"? secondly, you want to create the `out` variable or a new one? can you explain more detailed your desired output? ty — Stephan, Mar 15 '18 at 10:15
@AlMac Thanks. Please add the output table. It is still unclear the role of 'group' you are referring to. I don't get how ID = A, Event = 1 get Out = 2. — YOLO, Mar 15 '18 at 10:22
sorry group = ID. I want to create the out variable which creates an index that is repeated within ID if event = 1 and then increments up by 1 if separated by 0. However, if there are 2 consecutive events (event = 1) that straddle 2 IDs (eg 'A' and 'B') then I need it to increment up again as with the example above. Hope this is clear and thank you for your help. — Al Mac, Mar 15 '18 at 10:22
@AlMac Nope, not clear. Can you please edit your question to include the expected output based on your sample data. Also clearly describe the "rules" in the main body of your question. I have no idea what you mean by "straddle 2 IDs". — Maurits Evers, Mar 15 '18 at 10:38
I should have been more clear at the onset - the df I have provided has the desired output field in it called 'Out'. To put this question in context: — Al Mac, Mar 15 '18 at 10:59
ID - is the individual ID of a penguin and event 1 is a dive. I am trying to index all consecutive dives for each individual into dive bouts, hence repeated numbers so that I can summarize qualities of all dive bouts across all individuals at a later stage. — Al Mac, Mar 15 '18 at 11:01
By straddle 2 IDs I mean that the last event for ID 'A' was a 1 (i.e. a dive) but the first event for ID 'B' was also a 1 but I dont want these to be indexed as the same dive bout. — Al Mac, Mar 15 '18 at 11:05

MKR · Accepted Answer · 2018-03-15T11:19:36.143

One solution could be achieved as below.

The approach: Logic seems to that out should be incremented whenever there is change in Event or change in ID. out will not increment if Event is 0. The increment to out is beyond boundary of group.

library(dplyr)
df %>% mutate(increment = 
     ifelse(Event != 0 & (ID != lag(ID) | Event != lag(Event)), 1, 0)) %>%
 mutate(out_calculated = ifelse(Event == 0, 0, cumsum(increment))) %>%
 select(-increment)

#     ID Event Out      out_calculated
# 1   A     0   0             0
# 2   A     1   1             1
# 3   A     1   1             1
# 4   A     0   0             0
# 5   A     1   2             2
# 6   B     1   3             3
# 7   B     0   0             0
# 8   C     1   4             4
# 9   C     1   4             4
# 10  C     1   4             4

Data

df <- read.table(text = "ID      Event Out  
A         0    0
A         1    1
A         1    1
A         0    0
A         1    2
B         1    3
B         0    0
C         1    4
C         1    4
C         1    4", header = TRUE, stringsAsFactor = FALSE)

That's perfect and just what I was looking for in plyr. Thanks a lot for this - has saved me many head-scratching hours! — Al Mac, Mar 15 '18 at 11:21

s_baldur · Answer 2 · 2018-03-15T11:38:23.373

A somewhat hacky solution using an alternative package data.table. This solution should be faster also.

library(data.table)
setDT(dt) # assuming your data.frame is called dt
dt[, out_dt := frank(rleid(paste(Event, ID)) * Event, ties.method = "dense") - 1]
dt
    ID Event Out out_dt
 1:  A     0   0      0
 2:  A     1   1      1
 3:  A     1   1      1
 4:  A     0   0      0
 5:  A     1   2      2
 6:  B     1   3      3
 7:  B     0   0      0
 8:  C     1   4      4
 9:  C     1   4      4
10:  C     1   4      4

Repeated sequential index based on events in grouped data

2 Answers2