1

I have two data frames one is a reference I am comparing the second data frame to the first column of the reference to find the closest matches and then returning the corresponding item from the second column of the reference data frame. I am trying to find a faster method to do this than what I’m currently doing which is a for loop the same as the one at the bottom which works but is there a better way to do it avoiding the iteration?

The expected results from looking up values for

a
1.1
2.1
2.9
3.1 
4.2
5.0 

against reference values of

A   B
1   10
2   20
3   30
4   40
5   50

Would be

a   B
1.1 10
2.1 20
2.9 30
3.1 30
4.2 40
5.0 50

The method i have is

import numpy as np
import pandas as pd

def reference_df():
    A = [1, 2, 3, 4, 5]
    B = [10, 20, 30, 40, 50]
    df1 = pd.DataFrame(A, columns=['A'])
    df1['B'] = pd.Series(B, index=df1.index)
    return(df1)


def working_df():
    a = [1.1, 2.1, 2.9, 3.1, 4.2, 5.0]
    df1 = pd.DataFrame(a, columns=['a'])
    return(df1)


def Look_up():
    df1 = reference_df()
    df2 = working_df()
    A = df1['A']
    B = df1['B']
    a = df2['a']

    def Look_up_b(a):
        idx = (np.abs(A - a)).argmin()
        b = B[idx]
        return(b)

    b = []
    for i in a:
        b.append(Look_up_b(i))

    df3 = pd.DataFrame(a, columns=['a'])
    df3['b'] = pd.Series(b, index=df3.index)
    return(df3)

print(Look_up())
AMC
  • 2,642
  • 7
  • 13
  • 35
W.Ellis
  • 37
  • 7
  • 1
    Why use a nested function here? As an aside, variable and function names should generally follow the `lower_case_with_underscores` style. – AMC Apr 14 '20 at 15:15
  • Does this answer your question? [How do I find the closest values in a Pandas series to an input number?](https://stackoverflow.com/questions/30112202/how-do-i-find-the-closest-values-in-a-pandas-series-to-an-input-number) – AMC Apr 14 '20 at 17:28

1 Answers1

2

This is merge_asof:

# convert reference values to float
references['A'] = references['A'].astype('float64')

pd.merge_asof(df, references, 
              left_on='a', right_on='A',
              direction='nearest'
             )

Output:

     a    A   B
0  1.1  1.0  10
1  2.1  2.0  20
2  2.9  3.0  30
3  3.1  3.0  30
4  4.2  4.0  40
5  5.0  5.0  50
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74