2

Connected to: Pandas: add column with index of matching row from other dataframe

Matching multiple columns with corresponding columns from 2nd dataframe, and returning index of the matching row from the 2nd dataframe.

df1['new_column'] =  df1.apply(lambda x: df2[(df2.col1 == x.col1)
                                           & (df2.col2 == x.col2)
                                           & (df2.col3 == x.col3)
                                           & (df2.col4 == x.col4)
                                           & (df2.col5 == x.col5)].index[0], axis=1)

Code above works like a charm... unless one of the columns can contain nan values, since nan != nan. In other words, even if col1:col4 in df1 matches df2 and col5 in both df1 and df2 is nan it fails to match it returning empty index object.
I need it to return True if col1:col5 match no matter if they contain values or nan.

Anyone knows solution for that?

MrPilipo
  • 139
  • 1
  • 1
  • 8

1 Answers1

2

One workaround here is to simply use fillna to replace all na values with something like a 'NaN' string.

Simply use:

df1 = df1.fillna('NaN')
df2 = df2.fillna('NaN')

Then use your existing code.

Jim Eisenberg
  • 1,490
  • 1
  • 9
  • 17
  • It crossed my mind. Replace nan, process new column, restore nan, will go with it if there is no more elegant solution. – MrPilipo Dec 03 '19 at 15:43
  • Elegance is a false god! If an elegant solution requires you learning two new functions, a practical workaround in 4 lines is better imo :P – Jim Eisenberg Dec 04 '19 at 10:25