0

I am actually having 2 dataframes as below:

DataFrame 1
colA   colB   startDate   EndDate
1      L1     1/8/2022    4/8/2022
2      L2     2/8/2022    10/8/2022

DataFrame 2
colA   colB   startDate
3      L1     31/7/2022
4      L1     2/8/2022
5      L1     3/8/2022
6      L1     4/8/2022
7      L2     12/8/2022

I wanted to merge these 2 dataframes with start date and end date as a range of dates and check the count of number of rows between these 2 dates. For the above case, for L1 there should be 3 counts because 31/7/2022 do not come under the start date and end date range. For L2, the count is 0 as there is no value between these conditions. How can I approach in pandas ?

python_interest
  • 874
  • 1
  • 9
  • 27
  • I wanted to merge these 2 dataframes with `start date and end date as a range of dates` what you mean by this , give us the dataframes so we can try out the solution – Mohamed Fathallah Aug 29 '22 at 01:45

2 Answers2

0

First, make sure all your dates columns are converted to Timestamp (use pd.to_datetime). It looks like you are storing them as strings.

If you want to count the number of rows in df2 that match your requirements:

tmp = df1.merge(
    # Rename the columns in df2 to make things clearer    df2.rename(columns={"colA": "colA_df2", "startDate": "date"}), on="colB", how="left"
)
tmp["matched"] = tmp["date"].between(tmp["startDate"], tmp["endDate"])

result = tmp.groupby(["colA", "colB", "startDate", "endDate"])["matched"].sum().reset_index()

Result:

   colA colB  startDate    endDate  matched
0     1   L1 2022-08-01 2022-08-04        3
1     2   L2 2022-08-02 2022-08-10        0
Code Different
  • 90,614
  • 16
  • 144
  • 163
0
assert df1.colB.nunique() == len(df1)

limit = df1[['colB','startDate','EndDate']].set_index('colB')

df2.groupby('colB').apply(lambda x: 
    sum((x.startDate >= limit.loc[x.name, 'startDate']) 
        & (x.startDate <= limit.loc[x.name, 'EndDate']))
)

Output:

L1    3
L2    0
Vitalizzare
  • 4,496
  • 7
  • 13
  • 32