Iteratively update the values of a dataframe with another one

Question

I have a main df:

print(df)

   item            dt_op     
0  product_1     2019-01-08   
1  product_2     2019-02-08    
2  product_1     2019-01-08        
...

and a subset of the first one, that contains only one product and two extra columns:

print(df_1)

   item            dt_op        DQN_Pred  DQN_Inv
0  product_1     2019-01-08         6      7.0
2  product_1     2019-01-08         2      2.0
...

That I am iteratively creating, with a for loop (hence, df_1 = df.loc[df.item == i] for i in items).

I would like to merge df_1 and df, at every step of the iteration, hence updating df with the two extra columns.

print(final_df)

   item            dt_op      DQN_Pred  DQN_Inv
0  product_1     2019-01-08       6      7.0  
1  product_2     2019-02-08      nan      nan
2  product_1     2019-01-08       2      2.0     
...

and update the nan at the second step of the for loop, in which df_1 only contains product_2.

How can I do it?

Please see [pandas-merging-101](https://stackoverflow.com/questions/53645882/pandas-merging-101), that may be helpful. — Karn Kumar, Aug 24 '19 at 19:09

anky · Accepted Answer · 2019-08-24T18:57:13.303

1

IIUC, you can use combine_first with reindex:

final_df=df_1.combine_first(df).reindex(columns=df_1.columns)

        item      dt_op  DQN_Pred  DQN_Inv
0  product_1 2019-01-08       6.0      7.0
1  product_2 2019-02-08       NaN      NaN
2  product_1 2019-01-08       2.0      2.0

Alternatively, Using merge , you can use the common keys with left_index and right_index =True:

common_keys=df.columns.intersection(df_1.columns).tolist()
final_df=df.merge(df_1,on=common_keys,left_index=True,right_index=True,how='left')

        item      dt_op  DQN_Pred  DQN_Inv
0  product_1 2019-01-08       6.0      7.0
1  product_2 2019-02-08       NaN      NaN
2  product_1 2019-01-08       2.0      2.0

edited Aug 24 '19 at 18:57

answered Aug 24 '19 at 18:37

anky

74,114
11
41
70

The first method does not work iteratively, meaning that as soon as the loop reaches the second step, nan are replaced but the remaining values become nan ; the second ones duplicates columns (DQN_Pred_x, DQN_Pred_y etc.) – Alessandro Ceccarelli Aug 24 '19 at 19:03
@AlessandroCeccarelli i dont know what you mean, if you can create an example which best describes your issue, that'd be helpful. iterations are generally not preffered in pandas – anky Aug 24 '19 at 19:09
1

combine first works, I was resetting df_1's index at every iteration (maybe specify that the index in the subset df must not be reset); however, the second seems to not be working. Thank you :) – Alessandro Ceccarelli Aug 25 '19 at 08:26

Iteratively update the values of a dataframe with another one

1 Answers1