2

I'm currently having a problem with filling the missing values of my dataframe using a different dataframe.

Data samples:

df1

A   B       C
b   1.0     1.0
d   NaN     NaN
c   2.0     2.0
a   NaN     NaN
f   NaN     NaN

df2

A   B   C
c   1   5
b   2   6
a   3   7
d   4   8

I've tried to follow the solution in this question but it would appear that it is only possible if the values you're looking up is present in both dataframes you're joining.

My attempt

mask = df1["B"].isnull()
df1.loc[mask, "B"] = df2[df1.loc[mask, "A"]].values

Error:

"None of [Index(['d', 'a', 'f'], dtype='object')] are in the [columns]"

Expected result:

A   B       C
b   1.0     1.0
d   4.0     8.0
c   2.0     2.0
a   3.0     7.0
f   NaN     NaN

Also, can it be used it fill two columns?

yatu
  • 86,083
  • 12
  • 84
  • 139
Maku
  • 1,476
  • 10
  • 21

1 Answers1

4

You can use combine_first here, which is exactly aimed at filling NaNs by matching with another dataframe's columns:

df1.set_index('A').combine_first(df2.set_index('A')).reset_index()

   A    B    C
0  a  3.0  7.0
1  b  1.0  1.0
2  c  2.0  2.0
3  d  4.0  8.0
4  f  NaN  NaN
yatu
  • 86,083
  • 12
  • 84
  • 139
  • 1
    I didn't know this method existed and that's even less complicated that it looks. Thanks man! Stay at home! – Maku Apr 28 '20 at 20:28