I have the data that look in the following way:
Datetime column Binary column
2020-01-02 08:30:00 True
2020-01-02 08:31:00 False
2020-01-02 08:32:00 False
2020-01-02 08:33:00 False
2020-01-02 08:34:00 True
.
.
.
2020-01-02 08:58:00 True
As you can see, the data always comes in the intervals of 1 minute. In addition, there is a binary true/false column.
I have a variable gap that specifies the maximum number of consecutive falses that can occur in between the two trues. If gap is bigger, I do nothing; if gap is smaller, I want to drop all affected rows. In our example (for the first 5 rows), if gap=3 or more, I wouldn't want to drop any rows. If gap was smaller (1, 2), I would like to drop row 2, 3, 4.
My current solution solves this problem by using between_dates()
method. I iterate through the zipped list of all the dates with True and check if the length of the series of dates in between is smaller or equal to the gap.
Are you aware of any other approach (preferably vectorized) that could solve this problem without using the for loop?