1

I have a pandas dataframe with 2 numeric columns such as:

col1 col2
2.5 4.1
5.1 2.2
3.0 7.8

And I also have a 2D dataframe such as:

0 3 5 10
0 A D E
5 B F G
10 C H I

I need to find value in second table for each row in first table based on value in col1 is between indices in first column of second table and value in col2 fits between values in columns of second table.

I know it may sounds complicated, but I have this logic implemented in MS Excel and I need to transform it into Python. What is the best method to approach this problem.

One way could be to construct if statements, but the second table is quite large and may change in the future, so this is not an option for me. What could be solution to this? I thought about transforming second table into 1D with double indices.

Edit: Desired output is to have new column based on second table:

col1 col2 col3
2.5 4.1 A
5.1 2.2 B
3.0 7.8 D

The logic is to lookup value with following conditions. We search for a row that the number in col1 is lower then value in a given index and larger than number in previous index at the same time. Same logic applies for columns dimension.

Lukas Tomek
  • 96
  • 12
  • "I thought about transforming second table into 1D with double indices." => that would be a great start – mcsoini Jul 07 '23 at 12:11
  • 3
    Your question needs a minimal reproducible example consisting of as a minimum sample input, and expected output. In addition we should see your attempt, in the form of relevant code necessary to reproduce the problem and the actual output generated by your code.. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Jul 07 '23 at 12:18
  • Why is the third value in col3 a `D`? 3.8 is between 3 and 5, shouldn't it be `A`? Can you break down the logic for all 3 matches? – mozway Jul 07 '23 at 13:54
  • Relating to your edit, both 3.8 and 4.1 are between 3 and 5, why would one map to A and the other to D? Also 2.2 is below 3, not above it. – mozway Jul 07 '23 at 14:04
  • Sorry, I did a mistake in axis, it should be correct now. – Lukas Tomek Jul 07 '23 at 14:08

1 Answers1

1

Assuming df1 and df2, I would use numpy.searchsorted:

import numpy as np

# set 0 as index, sort values on both axes (optional if already sorted)
tmp = df2.set_index(0).sort_index().sort_index(axis=1)

# find insertion index on both axes
idx = np.searchsorted(tmp.index, df1['col1'])
col = np.searchsorted(tmp.columns, df1['col2'])

# perform 2D indexing
df1['col3'] = tmp.to_numpy()[np.clip(idx-1, 0, tmp.shape[0]),
                             np.clip(col-1, 0, tmp.shape[1])]

NB. given the output, I forced the value to get the first row/column if there is no match. For example, 2.2 is below 3, so we'll match it to 3.

Output:

   col1  col2 col3
0   2.5   4.1    A
1   5.1   2.2    B
2   3.0   7.8    D

NaN if no match

Alternatively, if you want NaN upon no match:

import numpy as np

# set 0 as index, sort values on both axes (optional if already sorted)
tmp = df2.set_index(0).sort_index().sort_index(axis=1)

# find insertion index on both axes
idx = np.searchsorted(tmp.index, df1['col1'])-1
col = np.searchsorted(tmp.columns, df1['col2'])-1
m = (idx>=0) & (col>=0)

# perform 2D indexing
df1.loc[m, 'col3'] = tmp.to_numpy()[idx[m], col[m]]

Output:

   col1  col2 col3
0   2.5   4.1    A
1   5.1   2.2  NaN  # 2.2 is below 3, no match
2   3.0   7.8    D
mozway
  • 194,879
  • 13
  • 39
  • 75