2

I have two dataframes, something like this:

df1 = pd.DataFrame({
    'Identity': [3, 4, 5, 6, 7, 8, 9],
    'Value': [1, 2, 3, 4, 5, 6, 7],
    'Notes': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
    })
df2 = pd.DataFrame({
    'Identity': [4, 8],
    'Value': [0, 128],
    })

In[3]: df1
Out[3]: 
   Identity  Value Notes
0         3      1     a
1         4      2     b
2         5      3     c
3         6      4     d
4         7      5     e
5         8      6     f
6         9      7     g

In[4]: df2
Out[4]: 
   Identity  Value
0         4      0
1         8    128

I'd like to use df2 to overwrite df1 but only where values exist in df2, so I end up with:

   Identity  Value Notes
0         3      1     a
1         4      0     b
2         5      3     c
3         6      4     d
4         7      5     e
5         8    128     f
6         9      7     g

I've been searching through all the various merge, combine, join functions etc, but I can't seem to find one that does what I want. Is there a simple way of doing this?

DrAl
  • 70,428
  • 10
  • 106
  • 108
  • Use `replace` instead `map` – jezrael Sep 24 '21 at 11:31
  • 1
    another way `df1['Value'] = df1['Identity'].map(df2.set_index('Identity')['Value']).fillna(df1['Value'])` – Umar.H Sep 24 '21 at 11:32
  • When there's just one column I'd go with: `df1.loc[df1.Identity.isin(df2.Identity), 'Value'] = df2.Value.values` which is slightliy better `690 µs ± 10.3 µs per loop` than all answers in this and other topics. – Mariusz Olszowski Sep 24 '21 at 11:39

1 Answers1

2

Use:

df1['Value'] = df1['Identity'].map(df2.set_index('Identity')['Value']).fillna(df1['Value'])

Or try reset_index with reindex and set_index with fillna:

df1['Value'] = df2.set_index('Identity').reindex(df1['Identity'])
                  .reset_index(drop=True)['Value'].fillna(df1['Value'])

>>> df1
   Identity  Value Notes
0         3    1.0     a
1         4    0.0     b
2         5    3.0     c
3         6    4.0     d
4         7    5.0     e
5         8  128.0     f
6         9    7.0     g
>>> 

This fills missing rows in df2 with NaN and fills the NaNs with df1 values.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114