1

I have two Dataframes with some sales data as below:

df1:

prod_id,sale_date,new
101,2019-01-01,101_2019-01-01
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
101,2019-01-04,101_2019-01-04

df2:

prod_id,sale_date
101,2019-01-01,101_2019-01-01
101,2019-01-04,101_2019-01-04

I am trying to compare the above two Dataframe to find dates which are missing in df2 as compared to df1

I have tried to do the below:

final_1 = df1.merge(df2, on='new', how='outer')

This returns back the below Dataframe:

prod_id_x,sale_date_x,new,prod_id_y,sale_date_y
101,2019-01-01,101_2019-01-01,,
101,2019-01-02,101_2019-01-01,,
101,2019-01-03,101_2019-01-01,,
101,2019-01-04,101_2019-01-01,,
,,101_2019-01-01,101,2019-01-01
,,101_2019-01-04,101,2019-01-04

This is not letting me compare these 2 Dataframe.

Expected Output:

prod_id_x,sale_date_x,new
101,2019-01-02,101_2019-01-02
101,2019-01-03,101_2019-01-03
hello kee
  • 289
  • 2
  • 6
  • 17

1 Answers1

0

You can use drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)
frankegoesdown
  • 1,898
  • 1
  • 20
  • 38