I realize my title is a bit confusing, but I think I can make it clearer if we proceed by example. What I want to do is a vectorized test to check if any of the values in a given series is contained in any of the intervals defined by a DataFrame object with a start
and stop
column.
Consider the series, valid
, which is the column of a DataFrame called trials
. Here is what trials
Looks like:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 156 entries, 0 to 155
Data columns (total 3 columns):
start 156 non-null values
stop 156 non-null values
valid 156 non-null values
dtypes: bool(1), float64(2)
I have a separate DataFrame called 'blink`. It has three columns:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 41 entries, 0 to 40
Data columns (total 3 columns):
tstart 41 non-null values
tstop 41 non-null values
dur 41 non-null values
dtypes: bool(1), float64(2)
The last column is not directly relevant: it's the duration of the eyeblik, i.e. the difference betwee tstop
and tstart
.
I would like to set each row of trials['valid']
to False
if the interval between it's corresponding trials['start']
to trials['stop']
overlaps with any of the blink['tstart']
to blink['tstop']
intervals.
I could iterate through the rows and use np.arange
along with the in
operator to do this in a nested loop, but it literally takes hours (my actual data set is much larger than this dummy example). Is there a vectorized approach I could use? If not, is there a faster iteration-based approach?
If anything is unclear, I'll of course be happy to provide additional details.