Pandas lookup value in a range from another table

Question

In the example below I am trying to check if the "Value" in Table 1 is in the range of values in a row in Table 2 for columns "Start" and "Stop". If the value is in that range I want to return the type of "Fruit". The between method seems to be able to accomplish this but unsure how to apply it to a row from another table. Note I need to perform this task using a large dataset and am open to using methods outside of the pandas library.

Sample Code

df1 = pd.DataFrame({'Date': {0: '06-01', 1: '06-02', 2: '06-03', 3: '06-04'},
                   'Value': {0: 3, 1: 7, 2: 9, 3: 16}, })

df2 = pd.DataFrame({'Start': {0: 1, 1: 6, 2: 11, 3: 16},
                    'Stop': {0: 5, 1: 10, 2: 15, 3: 20},
                    'Fruit': {0: 'Apple', 1: 'Orange', 2: 'Pear', 3: 'Mango'},})

Table 1

    Date  Value
0  06-01      3
1  06-02      7
2  06-03      9
3  06-04     16

Table 2

    Fruit  Start  Stop
0   Apple      1     5
1  Orange      6    10
2    Pear     11    15
3   Mango     16    20

Table 1 Desired Output

    Date  Value  Fruit
0  06-01      3   Apple
1  06-02      7  Orange
2  06-03      9  Orange
3  06-04     16   Mango

I recommend IntervalIndex. https://stackoverflow.com/questions/56777623/how-to-aggregate-mean-of-rows-when-values-are-in-between-range-of-2-other-column/56777757#56777757 — cs95, Jun 26 '19 at 23:29
Also see my answer here: https://stackoverflow.com/a/46526392/4909087 — cs95, Jun 26 '19 at 23:30
Btw, why is fruit `None` at value `9`? Shouldn't that be `Orange`? — Erfan, Jun 26 '19 at 23:46
Then the answer of @cs95 linked works: First: `idx = pd.IntervalIndex.from_arrays(df2['Start'], df2['Stop'], closed='both')` Second: `df['fruit'] = df2.loc[idx.get_indexer(df1['Value']), 'Fruit']` — Erfan, Jun 26 '19 at 23:55
I am trying to use the code that @cs95 posted but I am getting error `ValueError: cannot reindex from a duplicate axis`. This is what I tried - `idx = pd.IntervalIndex.from_arrays(df2['Start'], df2['Stop'], closed='both') df1['fruit'] = df2.loc[idx.get_indexer(df1['Value']), 'Fruit']`. — MBasith, Jun 27 '19 at 00:01

score 4 · Accepted Answer · answered Jun 27 '19 at 00:48

4

Here is on of the way rather than using IntervalIndex , we check with numpy board-cast

s1=df2.Start.values
s2=df2.Stop.values
s=df1.Value.values[:,None]
np.dot((s>=s1)&(s<=s2),df2.Fruit)
array(['Apple', 'Orange', 'Orange', 'Mango'], dtype=object)

answered Jun 27 '19 at 00:48

BENY

317,841
20
164
234

@MBasith yw :-) happy coding – BENY Jun 27 '19 at 00:53
Hi, could you please let me know. How we can get final output as expected. I also have the same kind of requirement. Thanks in Advance. – Kullayappa M Oct 16 '19 at 14:00
1

@KPMk df1['Fruit']=np.dot((s>=s1)&(s<=s2),df2.Fruit) – BENY Oct 16 '19 at 14:10
how to avoid this returning nan, instead return a string said "not in fruit!" – greendino Jan 14 '21 at 00:46
1

@lone_coder after assign you can do `s = np.dot((s>=s1)&(s<=s2),df2.Fruit) ; s = np.where(s!=s,'not in fruit!',s) ` – BENY Jan 14 '21 at 01:47

Pandas lookup value in a range from another table

Sample Code

Table 1

Table 2

Table 1 Desired Output

1 Answers1