2

Using pandas=1.1.5. I want to update the values from df2 to df1. But df2 has new indices, and these are not appended to df1 when I use update. See below for details. Thank you

df1

      | Revenue |  Profit   | Sales |
0      |  100    |  300      |  1    |
1      |  500    |  900      |  3    |
2      |  200    |  100      |  4    |

df2

       | Sales |
0       | 10   |
6       |  12    |

desired df

      | Revenue |  Profit   | Sales |
0      |  100    |  300      |  10    |
1      |  500    |  900      |  3    |
2      |  200    |  100      |  4    |
6      |  Nan    |  Nan      |  12   |

df from using update

df1.update(df2)

      | Revenue |  Profit   | Sales |
0      |  100    |  300      |  10    |
1      |  500    |  900      |  3    |
2      |  200    |  100      |  4    |

sophros
  • 14,672
  • 11
  • 46
  • 75
Peter
  • 353
  • 3
  • 10
  • 2
    This is a classic use case for [`combine_first`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.combine_first.html) : `df2.combine_first(df1).loc[:,[*df1]]` – anky Oct 09 '21 at 05:53

3 Answers3

3

Unfortunately, update method is somewhat restricted (not implemented) in terms of the approaches for joining (join parameter can only be "left").

Therefore, you have to use both update and concat:

import pandas as pd

df1 = pd.DataFrame({'Revenue': [100,5000,200], 'Profit': [300,900,100], 'Sales': [1,3,4]})
df2 = pd.DataFrame({'Sales': [10,12]}, index=[0,6])

df1.update(df2, overwrite=True)
to_be_added = df2.loc[df2.index.difference(df1.index)]
dd = pd.concat([df1, to_be_added])

Results in:

   Revenue  Profit  Sales
0    100.0   300.0   10.0
1   5000.0   900.0    3.0
2    200.0   100.0    4.0
6      NaN     NaN   12.0
sophros
  • 14,672
  • 11
  • 46
  • 75
  • I noticed using update mess up some format. The original value in df2= "01-10-2019 12:00:00 am " becomes "1569888000000000000" in df1 after it is updated into df1. Anyway to resolve this? Thanks – Peter Oct 09 '21 at 10:01
  • Apparently, you had a typed column. You can reinstate the type of the column by using `astype`: https://stackoverflow.com/questions/15891038/change-column-type-in-pandas – sophros Oct 09 '21 at 17:58
2

Using Join:

df1 = pd.DataFrame(data={'Revenue':[100,500,200], 'Profit':[300,900,100], 'Sales':[1,3,4]})
df2 = pd.DataFrame(data={'Sales':[10,12]}, index=[0,6])
df1 = df1.join(df2, how='outer', lsuffix='_df1')
df1['Sales'].fillna(df1['Sales_df1'], inplace=True)
df1.drop(columns=['Sales_df1'], inplace=True)
print(df1)

Using merge

df1 = df1.merge(df2, how='outer', left_index=True, right_index=True, suffixes=('_df1', ''))
df1['Sales'].fillna(df1['Sales_df1'], inplace=True)
df1.drop(columns=['Sales_df1'], inplace=True)
print(df1)

OutPut:

   Revenue  Profit  Sales
0    100.0   300.0   10.0
1    500.0   900.0    3.0
2    200.0   100.0    4.0
6      NaN     NaN   12.0
Muhammad Hassan
  • 4,079
  • 1
  • 13
  • 27
1

You can reindex your dataframe before update:

out = df1.reindex(df1.index.union(df2.index))
out.update(df2)
print(out)

# Output:
   Revenue  Profit  Sales
0    100.0   300.0   10.0
1    500.0   900.0    3.0
2    200.0   100.0    4.0
6      NaN     NaN   12.0
Corralien
  • 109,409
  • 8
  • 28
  • 52