I am working on a dataset which have parent_id, id, start_date and end_date as columns.
I want to check that in a calendar year, how many days a parent_id was not having an id running. The problem is with the “Overlap”. I want to exclude the overlapping days. How can I solve this problem?
Sample Input:
parent_id id start_date end_date
ABCD 1 2019-10-26 2020-06-30
ABCD 2 2020-01-02 2020-04-01
ABCD 3 2020-07-09 2020-12-31
ABCD 4 2020-07-14 2020-09-26
EFGH 22 2020-01-02 2020-03-31
EFGH 23 2020-01-02 2020-04-01
EFGH 24 2020-01-15 2020-04-07
Sample Output:
parent_id Year (2020)
ABCD 8
EFGH 268
So in year 2020 parent_id ABCD was inactive for 8 days. (July 1st to July 8th).
I did refer to similar questions on the portal but none of it worked for my problem.
Efficient date range overlap calculation in python?