I have a dataframe containing lower and upper bounds of index, with a corresponding value, and I am trying to match the value of te corresponding index. My dataframe consists of 400.000 rows. an example of my dataframe:
df1 = pd.DataFrame({'low':[4,7,8],'high':[6,7,21],'value':[10,15,20]})
df2 = pd.DataFrame({'index':[4,5,6,7,8,9]})
Output:
low high value
0 4 6 10
1 7 7 15
2 8 21 20
Now I want to add value of df1 to df2, whenever the index is between low and high, resulting in the following:
index value
0 4 10
1 5 10
2 6 10
3 7 15
4 8 20
5 9 20
I tried to make an intervalIndex using the following post: Searching a particular value in a range among two columns python dataframe
v = df1.loc[:, 'low':'high'].apply(tuple, 1).tolist()
idx = pd.IntervalIndex.from_tuples(v, 'both')
df2['value'] = df1.iloc[idx.get_indexer(df2['index'].values), 'value'].values
But my intervals keep overlapping, I removed duplicated versions, but I still need to remove some overlapping intervals. One way to find this was using a for loop with:
[idx.overlaps(x) for x in idx]
But this takes a lot of time and everytime my memory fails. Is there some fast way to find the overlapping interval?