0

I want to find a series of consecutive rows in a dataset where a condition is met the most often.

I have two columns that I can use for this; Either one with ones and zeros that alternate based on the presence or absence of a condition or a column which increments for the duration across which the desirable condition is present. I envision that I will need to use subset(),filter(), and/or rle() in order to make this happen but am at a loss as to how to get it to work.

In the example, I want to find 6 sequential rows that maximize the instances in which happens occurs.

Given the input:

library(data.frame)
df<-data.frame(time=c(1:10),happens=c(1,1,0,0,1,1,1,0,1,1),count=c(1,2,0,0,1,2,3,0,1,2))

I would like to see as the output the rows 5 through 10, inclusive, as the data subset output, using either the happens or count columns since this sequence of rows would yield the highest output of happens occurrences on 6 consecutive rows.

Bar-Tzur
  • 85
  • 1
  • 10
  • Is 6 part of your input or how should this be detected? – Ben373 Feb 13 '19 at 21:11
  • Although Wimpel answered this, as a courtesy I thought I should answer your question. 6 would be a fixed value that is known in advance and could be used as an input. – Bar-Tzur Feb 13 '19 at 22:32

1 Answers1

1
library(zoo)
which.max( rollapply( df$happens, 6, sum) )
#[1] 5

The fifth window of 6 rows apparently holds the maximum sum of df$happens
So the answer is row 5:10

Wimpel
  • 26,031
  • 1
  • 20
  • 37