0

I have two datasets:

Dict

Number_ID Price StartDate EndDate
1 3.49 20200101 20991231
2 0.25 20200101 20200502
2 0.49 20200503 20200509
3 0.13 20200401 20200403
3 0.15 20200404 20991231

and

Sales_List

Number_ID Date of Sale
1 20200103
2 20200431
2 20200505
3 20200402
3 20200408

My goal is to use data from Dict table and put right value to Sales_List['Price'] from Dict['Price'] on following conditions:

Number_ID is the same
Sales['Date of Sales'] is between Dict['StartDate'] and Dict['EndDate']

I've tried following code, but i have some error:

import pandas as pd
import numpy as pd

Sales_List['Price'] = np.where((Sales_List['Number_ID'] == Dict['Number_ID']) & 
                               (Sales_List['Date of Sale'] >= Dict['StartDate']) & 
                               (Sales_List['Date of Sale'] < Dict['EndDate']),
                               Dict['Price'],'')

also I have tried following code:

Sales_List = Sales.merge(Dict, how='left', on='Number_ID').reset_index()
Sales_List['Value'] = np.where((Sales_List['Date of Sale'] >= Dict['StartDate']) & 
                               (Sales_List['Date of Sale'] < Dict['EndDate']),1,0)
Sales_List = Sales_List[Sales_List['Value'] = 1]

But in the second example this merge allocates too much memory and crashed it. (Sales_List is more than 30M rows, Dict has 12k rows)

MemoryError: Unable to allocate 7.15 GiB for an array with shape (5, 191907116) and data type int64

Any of you have any idea how I can do something like that in python in more efficient way (than second example)?

Output should looks like:

Number_ID Date of Sale Price
1 20200103 3.49
2 20200431 0.25
2 20200505 0.49
3 20200402 0.13
3 20200408 0.15
3 20200401 0.13
3 20200409 0.15
2 20200509 0.49
2 20200508 0.49
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
arves
  • 1

0 Answers0