Merge 2 dataframes between start date and end date range pandas

Question

I am actually having 2 dataframes as below:

DataFrame 1
colA   colB   startDate   EndDate
1      L1     1/8/2022    4/8/2022
2      L2     2/8/2022    10/8/2022

DataFrame 2
colA   colB   startDate
3      L1     31/7/2022
4      L1     2/8/2022
5      L1     3/8/2022
6      L1     4/8/2022
7      L2     12/8/2022

I wanted to merge these 2 dataframes with start date and end date as a range of dates and check the count of number of rows between these 2 dates. For the above case, for L1 there should be 3 counts because 31/7/2022 do not come under the start date and end date range. For L2, the count is 0 as there is no value between these conditions. How can I approach in pandas ?

I wanted to merge these 2 dataframes with `start date and end date as a range of dates` what you mean by this , give us the dataframes so we can try out the solution — Mohamed Fathallah, Aug 29 '22 at 01:45

score 0 · Answer 1 · answered Aug 29 '22 at 01:55

0

First, make sure all your dates columns are converted to Timestamp (use pd.to_datetime). It looks like you are storing them as strings.

If you want to count the number of rows in df2 that match your requirements:

tmp = df1.merge(
    # Rename the columns in df2 to make things clearer    df2.rename(columns={"colA": "colA_df2", "startDate": "date"}), on="colB", how="left"
)
tmp["matched"] = tmp["date"].between(tmp["startDate"], tmp["endDate"])

result = tmp.groupby(["colA", "colB", "startDate", "endDate"])["matched"].sum().reset_index()

Result:

   colA colB  startDate    endDate  matched
0     1   L1 2022-08-01 2022-08-04        3
1     2   L2 2022-08-02 2022-08-10        0

answered Aug 29 '22 at 01:55

Code Different

90,614
16
144
163

how to copy this example (the output of each data frame) to CSV file , to test it instead of writing it line by line – Mohamed Fathallah Aug 29 '22 at 02:08
1

Use `result.to_csv(...)` – Code Different Aug 29 '22 at 02:10
DataFrame 1 `colA colB startDate EndDate 1 L1 1/8/2022 4/8/2022 2 L2 2/8/2022 10/8/2022` i mean this is how to make it csv without writing it by hand , cause to test his code i had to write the dataframe myself – Mohamed Fathallah Aug 29 '22 at 02:12
@MohamedFathallah You're looking for `pd.read_clipboard()`? – BeRT2me Aug 29 '22 at 02:46

score 0 · Answer 2 · answered Aug 29 '22 at 02:12

assert df1.colB.nunique() == len(df1)

limit = df1[['colB','startDate','EndDate']].set_index('colB')

df2.groupby('colB').apply(lambda x: 
    sum((x.startDate >= limit.loc[x.name, 'startDate']) 
        & (x.startDate <= limit.loc[x.name, 'EndDate']))
)

Output:

L1    3
L2    0

Merge 2 dataframes between start date and end date range pandas

2 Answers2