1

I have a main df:

print(df)

   item            dt_op     
0  product_1     2019-01-08   
1  product_2     2019-02-08    
2  product_1     2019-01-08        
...

and a subset of the first one, that contains only one product and two extra columns:

print(df_1)

   item            dt_op        DQN_Pred  DQN_Inv
0  product_1     2019-01-08         6      7.0
2  product_1     2019-01-08         2      2.0
...

That I am iteratively creating, with a for loop (hence, df_1 = df.loc[df.item == i] for i in items).

I would like to merge df_1 and df, at every step of the iteration, hence updating df with the two extra columns.

print(final_df)

   item            dt_op      DQN_Pred  DQN_Inv
0  product_1     2019-01-08       6      7.0  
1  product_2     2019-02-08      nan      nan
2  product_1     2019-01-08       2      2.0     
...

and update the nan at the second step of the for loop, in which df_1 only contains product_2.

How can I do it?

WolfgangK
  • 953
  • 11
  • 18
Alessandro Ceccarelli
  • 1,775
  • 5
  • 21
  • 41
  • Please see [pandas-merging-101](https://stackoverflow.com/questions/53645882/pandas-merging-101), that may be helpful. – Karn Kumar Aug 24 '19 at 19:09

1 Answers1

1

IIUC, you can use combine_first with reindex:

final_df=df_1.combine_first(df).reindex(columns=df_1.columns)

        item      dt_op  DQN_Pred  DQN_Inv
0  product_1 2019-01-08       6.0      7.0
1  product_2 2019-02-08       NaN      NaN
2  product_1 2019-01-08       2.0      2.0

Alternatively, Using merge , you can use the common keys with left_index and right_index =True:

common_keys=df.columns.intersection(df_1.columns).tolist()
final_df=df.merge(df_1,on=common_keys,left_index=True,right_index=True,how='left')

        item      dt_op  DQN_Pred  DQN_Inv
0  product_1 2019-01-08       6.0      7.0
1  product_2 2019-02-08       NaN      NaN
2  product_1 2019-01-08       2.0      2.0
anky
  • 74,114
  • 11
  • 41
  • 70
  • The first method does not work iteratively, meaning that as soon as the loop reaches the second step, nan are replaced but the remaining values become nan ; the second ones duplicates columns (DQN_Pred_x, DQN_Pred_y etc.) – Alessandro Ceccarelli Aug 24 '19 at 19:03
  • @AlessandroCeccarelli i dont know what you mean, if you can create an example which best describes your issue, that'd be helpful. iterations are generally not preffered in pandas – anky Aug 24 '19 at 19:09
  • 1
    combine first works, I was resetting df_1's index at every iteration (maybe specify that the index in the subset df must not be reset); however, the second seems to not be working. Thank you :) – Alessandro Ceccarelli Aug 25 '19 at 08:26