I want to optimize a following function. I'm using a broadcast inner join, which I assume is not fast enough.
I have a DataFrame of intervals with attributes: timestamp_start, timestamp_end And a time series Data Frame tuple with attributes: timestamp, value.
Function then returns all values that belong in one of the intervals:
def filter_intervals(intervals, df):
df = df.join(broadcast(intervals),
[df.timestamp >= intervals.timestamp_start,
df.timestamp <= intervals.timestamp_end],
how='inner')
return df
How should I rewrite a function that would be more efficient?