I have two dataframes of different length. dfSamples (63012375 rows) and dfFixations (200000 rows).
dfSamples = pd.DataFrame({'tSample':[4, 6, 8, 10, 12, 14]})
dfFixations = pd.DataFrame({'tStart':[4,12],'tEnd':[8,14]})
I would like to check each value in dfSamples if it is within any two ranges given in dfFixations and then assign a label to this value. I have found this: Check if value in a dataframe is between two values in another dataframe, but the loop solution is terribly slow and I cannot make any other solution work.
Working (but very slow) example:
labels = np.empty_like(dfSamples['tSample']).astype(np.chararray)
for i, fixation in dfFix.iterrows():
log_range = dfSamples['tSample'].between(fixation['tStart'], fixation['tEnd'])
labels[log_range] = 'fixation'
labels[labels != 'fixation'] = 'no_fixation'
dfSamples['labels'] = labels
Following this example: Performance of Pandas apply vs np.vectorize to create new column from existing columns I have tried to vectorize this but with no success.
def check_range(samples, tstart, tend):
log_range = (samples > tstart) & (samples < tend)
return log_range
fixations = list(map(check_range, dfSamples['tSample'], dfFix['tStart'], dfFix['tEnd']))
Would appreciate any help!