Add values to a column based on conditions with range

Question

I have two datasets:

Dict

Number_ID	Price	StartDate	EndDate
1	3.49	20200101	20991231
2	0.25	20200101	20200502
2	0.49	20200503	20200509
3	0.13	20200401	20200403
3	0.15	20200404	20991231

and

Sales_List

Number_ID	Date of Sale
1	20200103
2	20200431
2	20200505
3	20200402
3	20200408

My goal is to use data from Dict table and put right value to Sales_List['Price'] from Dict['Price'] on following conditions:

Number_ID is the same
Sales['Date of Sales'] is between Dict['StartDate'] and Dict['EndDate']

I've tried following code, but i have some error:

import pandas as pd
import numpy as pd

Sales_List['Price'] = np.where((Sales_List['Number_ID'] == Dict['Number_ID']) & 
                               (Sales_List['Date of Sale'] >= Dict['StartDate']) & 
                               (Sales_List['Date of Sale'] < Dict['EndDate']),
                               Dict['Price'],'')

also I have tried following code:

Sales_List = Sales.merge(Dict, how='left', on='Number_ID').reset_index()
Sales_List['Value'] = np.where((Sales_List['Date of Sale'] >= Dict['StartDate']) & 
                               (Sales_List['Date of Sale'] < Dict['EndDate']),1,0)
Sales_List = Sales_List[Sales_List['Value'] = 1]

But in the second example this merge allocates too much memory and crashed it. (Sales_List is more than 30M rows, Dict has 12k rows)

MemoryError: Unable to allocate 7.15 GiB for an array with shape (5, 191907116) and data type int64

Any of you have any idea how I can do something like that in python in more efficient way (than second example)?

Output should looks like:

Number_ID	Date of Sale	Price
1	20200103	3.49
2	20200431	0.25
2	20200505	0.49
3	20200402	0.13
3	20200408	0.15
3	20200401	0.13
3	20200409	0.15
2	20200509	0.49
2	20200508	0.49

Pandas isn't the best choice for data of this size. Maybe Dask is more helpful to you. — Quang Hoang, May 31 '21 at 14:44
I don't use Dask. You can read [Dask's join](https://docs.dask.org/en/latest/dataframe-joins.html). — Quang Hoang, May 31 '21 at 14:46
Does this answer your question? [Merge pandas dataframes where one value is between two others](https://stackoverflow.com/questions/30627968/merge-pandas-dataframes-where-one-value-is-between-two-others) — David Kaftan, May 31 '21 at 15:03

Add values to a column based on conditions with range

0 Answers0