0

I'm trying to fill missing values in one dataframe based on the data in other dataframes. My first dataframe looks like this

[df1]:

Key Value
1   15
2   30
23  NaN
1   15
4   60
2   30
5   NaN
2   NaN
4   NaN
25  NaN
62  NaN
32  NaN
21  315
21  NaN

And the dataframe I want to extract the information from looks like this one:

[df2]:
Key   Value
23        "fine"
5         "five"
2         30
4         60
25        "two"
62        "bar"
32        "foo"
21        315
So my output would look something like this:

   Key     Value
    2   30.0
   23  "fine"
    1   15.0
    4   60.0
    2   30.0
    5   "five"
    2   30.0
    4   60.0
   25  "two"
   62  "bar"
   32  "foo"
   21  315.0
   21  315.0

I've managed to do this iterating over the rows of both dataframes, using

for i, value in df1.iterrows():
    for j, value2 in df2.iterrows():
        if df1.loc[i,"Key"]==df2.loc[j,"Key"]:
            df1.loc[i,"Value"]=df2.loc[j,"Value"]
        else:
            pass

but using this with my actual information (Dataframes of around 100000 rows and 20+ columns each) takes too long and I believe there's a better way to do it with a pandas built-in function. Thanks in advance!

2 Answers2

1

You can achieve that with only one line of code:

df1 = df1.fillna(df2)
peru_45
  • 330
  • 3
  • 16
0

try this:

f = df1.query('Value.isnull()').ffill(axis=1).Value.map(dict(df2.values))
df1.loc[f.index, 'Value'] = f
print(df1)
ziying35
  • 1,190
  • 3
  • 6