1

As a minimal example consider the following two df (notice their sizes are not equal):

df
   min_val  max_val
0        0        4
1        5        9
2       10       14
3       15       19 
4       20       24
5       25       29

df1
   val
0    1
1    6
2    2
3   Nan
4    34

I am trying to check whether each value in df1 can be found within any pair in df. The output should be a new dataframe that will contain the val column of df1 plus the pair within which it was found plus an extra column with a name tag let's say 'within' and 'not within'. So the output should look like:

   val   min_val  max_val  nameTag
0   1      0        4       within
1   6      5        9       within
2   2      0        4       within
3   Nan    Nan      Nan     not within
4   34     Nan      Nan     not within

So far, any solutions I have found do the searches line-by-line missing the val 2 in df1 which is within the pair 0-4 in df (some posts that did not work for me HERE, and HERE).

Any pointers/advice/solutions will be much appreciated. Thanks

Yiannis
  • 25
  • 4

1 Answers1

3

I would use a merge_asof:

tmp = pd.merge_asof(df1.reset_index().sort_values(by='val').dropna(),
                    df.sort_values(by='min_val').astype(float),
                    left_on='val', right_on='min_val'
                   ).set_index('index').reindex(df1.index)

df1['nameTag'] = np.where(tmp['val'].le(tmp['max_val']), 'within', 'not within')

Or an IntervalIndex:

s = pd.Series('within', pd.IntervalIndex.from_arrays(df['min_val'], df['max_val']))

df1['nameTag'] =s.reindex(df1['val']).fillna('no within').to_numpy()

Output:

    val     nameTag
0   1.0      within
1   6.0      within
2   2.0      within
3   NaN  not within
4  34.0  not within
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Wow, such a fast response @mozway brilliant :-). ```merge_asof``` is not a pandas function I am familiar with. Is there a way to grab the pair from df where the value "belong to"? I will try to work it out myself but if you could save me time it would be appreciated. – Yiannis May 22 '23 at 12:15
  • 2
    Yes, if you look at `tmp` this is exactly doing that. You could then simply merge it with `df1`: `df1.merge(tmp, on='val')` – mozway May 22 '23 at 12:17
  • 2
    So many thanks @mozway, appreciated – Yiannis May 22 '23 at 12:18