0

I would like to override non null values from a dataframe to another dataframe with combination of first row and column (both being unique).

Basically, i am trying to join df2 on df1 only for non null values in df2, keeping df1 rows/column intact.

eg:

df1 = enter image description here

df2 = enter image description here

output = enter image description here

Ajit Singh
  • 11
  • 2
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – BeRT2me Jul 01 '22 at 06:04

1 Answers1

0

This should work:

output = df1.merge(df2, on='ID')
cols = [c for c in df1.columns if c!='ID']
for col in cols:
   output[col] = output[f'{col}_x'].fillna(output[f'{col}_y'])
   output.drop(columns=[f'{col}_x', f'{col}_y'], inplace=True)

Explanation:

  1. At first, we merge two dataframes using ID as a key. The merge joins two dataframes and if there are columns with the same name it adds suffixes _x and _y.
  2. Then we iterate over all the columns in df1 and fill the NA values in the column col_x using on the values in col_y and put the value into a new column col.
  3. We drop the auxiliary columns col_x and col_y

Edit: Still, even with the updated requirements the approach is similar. However, in this case, you need to perform a left outer join and fillna values of the second dataframe. Here is the code:

output = df1.merge(df2, on='ID', how='left')
cols = [c for c in df1.columns if c!='ID']
for col in cols:
   output[col] = output[f'{col}_y'].fillna(output[f'{col}_x'])
   output.drop(columns=[f'{col}_x', f'{col}_y'], inplace=True)
Yury
  • 20,618
  • 7
  • 58
  • 86
  • Thanks for above. But i need to update anything NON NULL in df2 to df1 on ID. For eg: In df1 if you see when ID =1 Address = A (Non Null) and in output for ID =1 Address being updated to Ind. – Ajit Singh Jul 04 '22 at 05:09
  • The answer is updated according to your requirements – Yury Jul 06 '22 at 14:01