find_nearest function gives error when calling dataframe column

Question

I have data like:

df1 = pd.DataFrame(columns=list('XY'))
df1['X'] = np.arange(0,100,0.1)
df1['Y'] = np.cos(df1['X']) + 30

df2 = pd.DataFrame(columns=list('AB'))
symbols['X'] = [22, 43, 64, 86]

And I am defining a function as:

def find_nearest(df1, df1['X'], df2['A'], df1['Y']):
        array = np.asarray(df1['X'])
        idx = (np.abs(array - df2['A'])).argmin()
        return df1.iloc[idx][df1['Y']]

But I get a syntax error when calling on the columns of the dataframes in the line:

def find_nearest(df1, df1['X'], df2['A'], df1['Y']):

It seems like the function doesn't like when I directly call on columns of dataframes. If I assign the columns into their own variables, this works fine. But for memory sake, I am trying to avoid that.

Does anyone know a workaround? If anything needs clarification, let me know.

This could help for an efficient one - https://stackoverflow.com/questions/45349561/. — Divakar, Jul 26 '19 at 13:45
Calling a dataframe column in that case still produces a syntax error. Although, seeing as it is more efficient, once I get the syntax error resolved, I may use that instead of my original. Thanks @Divakar — Cody Smith, Jul 26 '19 at 14:04
Linked one expects arrays and it finds the closest argmin indexes for one array with respect to another. So, you would need to feed in inputs accordingly - `df1['X'].values`, etc. — Divakar, Jul 26 '19 at 14:06
you are missing a parenthesis while defining your function , that might be it — Ayoub ZAROU, Jul 26 '19 at 14:08
Just checked the original code, and that wasn't it. I just forgot to put it here. Thanks for the catch though @AyoubZAROU — Cody Smith, Jul 26 '19 at 14:14
you should just pass df1 and df2 as arguments and not `df1['X'] ...`, it's not a valid variable name — Ayoub ZAROU, Jul 26 '19 at 14:17

score 1 · Accepted Answer · answered Jul 26 '19 at 14:20

1

df1['X'] is not a valid variable name in python, you could do instead :


def find_nearest(df1, df1_X, df2_A, df1_Y):
        array = np.asarray(df1_X
        idx = (np.abs(array - df2_A)).argmin()
        return df1.iloc[idx][df1_Y]

Or just :


def find_nearest(df1, df2):
        array = np.asarray(df1['X'])
        idx = (np.abs(array - df2['A'])).argmin()
        return df1.iloc[idx][df1['Y']]

answered Jul 26 '19 at 14:20

Ayoub ZAROU

2,387
6
20

Figured it was something simple like this. This worked, thanks – Cody Smith Jul 26 '19 at 14:32
glad I helped , happy coding – Ayoub ZAROU Jul 26 '19 at 14:35

find_nearest function gives error when calling dataframe column

1 Answers1