0

I would like to merge two dataframes based on the condition of a item sold falls between a date range, matches the price, and matches the item id. Non-matches need to be kept as well, and marked as a non-event.

Sales data:

Date        ID      Price
1/1/2020    3       9.99
1/1/2020    4       5.99
1/2/2020    10      69.99
1/10/2020   4       5.99
1/11/2020   3       10.99  

Promotional data:

Start_Date   End_Date   ID   Price  Promo
1/1/2020     1/8/2020   3    9.99   Event1
1/1/2020     1/8/2020   4    5.99   Event1
1/1/2020     1/8/2020   10   49.99  Event1

Desired result:

Date        ID     Price   Promo
1/1/2020    3      9.99    Event1
1/1/2020    4      5.99    Event1
1/2/2020    10     69.99   Non-Event
1/10/2020   4      5.99    Non-Event
1/11/2020   3      10.99   Non-Event

I have been stuck on this for quite some time. I've tried a few different methods including merge_asof, numpy.piecewise, and pandasql, but this seems to be a bit above my skill ceiling currently. Any help is greatly appreciated.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
stupideye
  • 27
  • 1
  • 5

2 Answers2

3

We can do the merge , the filter it

out = Sales.merge(Promotional, on = ['ID','Price'],how='left')
#out.Date = pd.to_datetime(out.Date)
#out.Start_Date = pd.to_datetime(out.Start_Date)
#out.End_Date = pd.to_datetime(out.End_Date)
out.loc[~(out.Date.ge(out.Start_Date)&out.Date.le(out.End_Date))|out.Promo.isnull(),'Promo'] = 'no-Event'
out = out.drop(['Start_Date','End_Date'],axis=1)
out
Out[144]: 
        Date  ID  Price     Promo
0 2020-01-01   3   9.99    Event1
1 2020-01-01   4   5.99    Event1
2 2020-01-02  10  69.99  no-Event
3 2020-01-10   4   5.99  no-Event
4 2020-01-11   3  10.99  no-Event
BENY
  • 317,841
  • 20
  • 164
  • 234
2

If your dataframes are too large, the you can use cartesian join, using merge and how='left' and filter the dataframe afterward for date between start and end date, using query.

df_sales['Promo'] = df_sales.merge(df_promo, on='ID', how='left')\
                            .query('Start_Date <= Date <= End_Date')['Promo']
df_sales['Promo'] = df_sales['Promo'].fillna('Non-event')
print(df_sales)

Output:

        Date  ID  Price      Promo
0 2020-01-01   3   9.99     Event1
1 2020-01-01   4   5.99     Event1
2 2020-01-02  10  69.99     Event1
3 2020-01-10   4   5.99  Non-event
4 2020-01-11   3  10.99  Non-event
Scott Boston
  • 147,308
  • 15
  • 139
  • 187