I recently started using python at work instead of SQL. I don't have access to any servers, nor will IT allow me to install a local instance of SQL SERVER EXPRESS. The datasets are too large to use Excel or Access.
I'm using my limited knowledge of python as a workaround. I've successfully analyzed other datasets, but this one presents a new challenge. I have a very large dataset (25 mill+ rows) and I would like to perform the equivalent to this: IN T-SQL:
SELECT
A.TransactionDate,
B.StartDate,
B.EndDate,
B.RefValue
FROM table1 AS A
LEFT JOIN table2 AS B
ON A.TransactionDate BETWEEN B.StartDate AND B.EndDate
OR
ON A.Date1 >= B.StartDate AND A.Date <= B.EndDate
Sample Data:
import pandas as pd
df1 = pd.DataFrame(
{
'TranDate': ['2018/01/05', '2018/02/07', '2018/03/24']
,'Loc': [4000, 5000, 4200]
}
)
df2 = pd.DataFrame(
{
'StartDate': ['2018/01/01', '2018/02/01', '2018/03/01']
,'EndDate': ['2018/01/31','2018/02/28', '2018/03/30']
,'PP': ['01', '02', '03']
}
)
Expected Dataframe:
dfR = pd.DataFrame(
{
'TranDate': ['2018/01/05', '2018/02/07', '2018/03/24']
,'Loc': [4000, 5000, 4200]
,'PP': ['01', '02', '03']
}
)
I tried using the pandas.merge()
statement, but there is no option for an inequality.
How can I recreate the previous statement in python?