A similar question was asked here Pandas merge on multiple columns ignoring NaN but without answer, so I'll ask maybe someone can help.
I need to merge values from df2
into df1
, but the key used in the merge differs between rows in df2
, as the rows in df2
have NaN
s in different columns, and in that case I want to ignore those columns, and use for each row only the columns that have values.
df1 = pd.DataFrame([[0, 1, 3], [0, 2, 4], [1, 2, 5]], columns=['level1', 'level2', 'level3'])
df1
level1 level2 level3
0 0 1 3
1 0 2 4
2 1 2 5
df2 = pd.DataFrame([[0, None, None, 10], [0, 1, None, 12], [None, 2, 5, 13]], columns=['level1', 'level2', 'level3', 'value'])
df2
level1 level2 level3 value
0 0.0 NaN NaN 10
1 0.0 1.0 NaN 12
2 NaN 2.0 5.0 13
When I do df1.merge(df2, how='left')
, I get df1
with NaN
in the value
column, since there is no match on all the level
columns as pandas
is trying to match the Nan
values as well.
What I do want is to get a match for any rows in df2
without trying to match the NaN
s:
level1 level2 level3 value
0 0 1 3 10
1 0 1 3 12
2 0 2 4 10
3 1 2 5 13
Explanation:
Row 0 in df1
has a match on the non-NaN
columns of rows 0 and 1 in df2
, so it gets values 10 and 12 from there. Row 1 in df1
has a match on the non-NaN
columns of row 0 in df2
, so it gets value 12 from there. Row 2 in df1
has a match on the non-NaN
columns of row 2 in df2
, so it gets value 13 from there.
In the real data I actually have 6 level
columns and the non-NaN
columns for each row in df2
can be any combination or a single column from there.
What I do now is to iterrows
the rows in df2
, create for each one a mini-dataframe
of only the non-NaN
columns, and merge df1
with it. But as we know, it's not really efficient, and I wonder it there something better that can be done.