I have two datasets:
Dict
Number_ID | Price | StartDate | EndDate |
---|---|---|---|
1 | 3.49 | 20200101 | 20991231 |
2 | 0.25 | 20200101 | 20200502 |
2 | 0.49 | 20200503 | 20200509 |
3 | 0.13 | 20200401 | 20200403 |
3 | 0.15 | 20200404 | 20991231 |
and
Sales_List
Number_ID | Date of Sale |
---|---|
1 | 20200103 |
2 | 20200431 |
2 | 20200505 |
3 | 20200402 |
3 | 20200408 |
My goal is to use data from Dict table and put right value to Sales_List['Price'] from Dict['Price'] on following conditions:
Number_ID is the same
Sales['Date of Sales'] is between Dict['StartDate'] and Dict['EndDate']
I've tried following code, but i have some error:
import pandas as pd
import numpy as pd
Sales_List['Price'] = np.where((Sales_List['Number_ID'] == Dict['Number_ID']) &
(Sales_List['Date of Sale'] >= Dict['StartDate']) &
(Sales_List['Date of Sale'] < Dict['EndDate']),
Dict['Price'],'')
also I have tried following code:
Sales_List = Sales.merge(Dict, how='left', on='Number_ID').reset_index()
Sales_List['Value'] = np.where((Sales_List['Date of Sale'] >= Dict['StartDate']) &
(Sales_List['Date of Sale'] < Dict['EndDate']),1,0)
Sales_List = Sales_List[Sales_List['Value'] = 1]
But in the second example this merge allocates too much memory and crashed it. (Sales_List is more than 30M rows, Dict has 12k rows)
MemoryError: Unable to allocate 7.15 GiB for an array with shape (5, 191907116) and data type int64
Any of you have any idea how I can do something like that in python in more efficient way (than second example)?
Output should looks like:
Number_ID | Date of Sale | Price |
---|---|---|
1 | 20200103 | 3.49 |
2 | 20200431 | 0.25 |
2 | 20200505 | 0.49 |
3 | 20200402 | 0.13 |
3 | 20200408 | 0.15 |
3 | 20200401 | 0.13 |
3 | 20200409 | 0.15 |
2 | 20200509 | 0.49 |
2 | 20200508 | 0.49 |