I have a Pandas Dataframe of indices and values between 0 and 1, something like this:
6 0.047033
7 0.047650
8 0.054067
9 0.064767
10 0.073183
11 0.077950
I would like to retrieve tuples of the start and end points of regions of more than 5 consecutive values that are all over a certain threshold (e.g. 0.5). So that I would have something like this:
[(150, 185), (632, 680), (1500,1870)]
Where the first tuple is of a region that starts at index 150, has 35 values that are all above 0.5 in row, and ends on index 185 non-inclusive.
I started by filtering for only values above 0.5 like so
df = df[df['values'] >= 0.5]
And now I have values like this:
632 0.545700
633 0.574983
634 0.572083
635 0.595500
636 0.632033
637 0.657617
638 0.643300
639 0.646283
I can't show my actual dataset, but the following one should be a good representation
import numpy as np
from pandas import *
np.random.seed(seed=901212)
df = DataFrame(range(1,501), columns=['indices'])
df['values'] = np.random.rand(500)*.5 + .35
yielding:
1 0.491233
2 0.538596
3 0.516740
4 0.381134
5 0.670157
6 0.846366
7 0.495554
8 0.436044
9 0.695597
10 0.826591
...
Where the region (2,4) has two values above 0.5. However this would be too short. On the other hand, the region (25,44) with 19 values above 0.5 in a row would be added to list.