Python equivalent to inequality join in SQL

Question

I recently started using python at work instead of SQL. I don't have access to any servers, nor will IT allow me to install a local instance of SQL SERVER EXPRESS. The datasets are too large to use Excel or Access.

I'm using my limited knowledge of python as a workaround. I've successfully analyzed other datasets, but this one presents a new challenge. I have a very large dataset (25 mill+ rows) and I would like to perform the equivalent to this: IN T-SQL:

    SELECT
     A.TransactionDate,
     B.StartDate,
     B.EndDate,
     B.RefValue
    FROM table1 AS A
    LEFT JOIN table2 AS B 
     ON A.TransactionDate BETWEEN B.StartDate AND B.EndDate

OR

ON A.Date1 >= B.StartDate AND A.Date <= B.EndDate

Sample Data:

import pandas as pd
df1 = pd.DataFrame(
    {
        'TranDate': ['2018/01/05', '2018/02/07', '2018/03/24']
        ,'Loc': [4000, 5000, 4200]
    }
)

df2 = pd.DataFrame(
    {
        'StartDate': ['2018/01/01', '2018/02/01', '2018/03/01']
        ,'EndDate': ['2018/01/31','2018/02/28', '2018/03/30']
        ,'PP': ['01', '02', '03']
    }
)

Expected Dataframe:

dfR = pd.DataFrame(
    {
        'TranDate': ['2018/01/05', '2018/02/07', '2018/03/24']
        ,'Loc': [4000, 5000, 4200]
        ,'PP': ['01', '02', '03']
    }
)

I tried using the pandas.merge() statement, but there is no option for an inequality.

How can I recreate the previous statement in python?

For this question it is crucial to include some example data and it would be most optimal if you would also include expected output example. — Erfan, Mar 11 '19 at 20:29
Note: with example I mean a small dataframe created which we can copy and run ourselves. See more here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Erfan, Mar 11 '19 at 20:31
Thanks to Parfait, who pointed me to a very similar question. There, I found how to use SQL inside python. Might not be the best approach, but I am much more comfortable with SQL, so I used that to solve my problem :) — Erick 3E, Mar 12 '19 at 18:23

Victor Valente · Answer 1 · 2019-03-11T21:16:54.750

0

Well, you could start by trying this:

dfR = df1.copy()
dfR.loc[dfR['TranDate'].between(df2['StartDate'], df2['EndDate']), 'PP'] = df2['PP']

Maybe this will work for you.

edited Mar 11 '19 at 21:16

answered Mar 11 '19 at 21:09

Victor Valente

761
9
24

I tried this and it did not work for me, but thank you for sharing – Erick 3E Mar 12 '19 at 18:21

Python equivalent to inequality join in SQL

1 Answers1