0

I've a 2 dataframe for which I want to update dataframe1 specific column "var1" with dataframe2 column "var1" based on unique column "respid".

enter image description here

enter image description here

enter image description here

This is just an example : There are more column in df1 along with above shown example. However dataframe2 is the same as shown. I've used below code for same and its working fine for var1. But my index column "respid" is missing after executing.

df1.set_index(['respid'], inplace=True)
df1.update(df2.set_index(['respid']))
df1.reset_index()
with pd.ExcelWriter("path"+ ".xlsx") as writer:
    df1.to_excel(writer, sheet_name='sheet2', index=False)

Please let me know why "respid" column is missing from df1 and if possible do correct.

Rabinzel
  • 7,757
  • 3
  • 10
  • 30

1 Answers1

0

Try this way

df =  pd.merge(df1,df2,on = ['respid'],how ='inner')
dfs = pd.merge(df,df1,on = ['respid'],how ='outer')

dfs =dfs.drop(columns=['var1_x','var1'])
dfs = dfs.fillna('')
dfs.columns = ['respid', 'var1']

which gives

   respid      var1
0   27217  screened
1   27211  screened
2   27214  screened
3   25402          
4    1111