The problem is that he may have worked for two companies, overlapping some time (case of companies D and C). This overlapping time shall not be included.
I read this like: No time is considered while working for 2 or more companies.
The example below follows this requirement. However, later in this answer, there is an example which considers overlapping time at least once.
You could do something like this:
import io
import pandas as pd
from datetime import datetime
# mock data / simulating CSV file
data = io.StringIO("""
Company,Start,End
A,2021-01-01,2021-03-01
B,2021-03-03,2021-06-07
C,2021-06-10,2021-08-28
D,2021-04-10,2021-10-02
""")
df = pd.read_csv(data)
def to_datetime(date):
return datetime.strptime(date, "%Y-%m-%d")
df["Start"] = df["Start"].apply(to_datetime)
df["End"] = df["End"].apply(to_datetime)
def get_full_months(start, end):
""" Returns the number of months including
non-full months. """
if end.year == start.year:
return end.month - start.month + 1
months = (end.year - start.year - 1) * 12
return (12 - start.month + 1) + end.month + months
def evaluate_duration(start, end):
""" Evaluates the duration taking other employment periods
into account. Overlapping periods are not considered
at all. Single days within a month are rounded up to
a full month. """
left = df[(df["End"] > start) & (df["Start"] < start)]
right = df[(df["Start"] > start) & (df["End"] > end)]
overlaps = df[(df["Start"] < start) & (df["End"] > end)]
within = df[(df["Start"] > start) & (df["End"] < end)]
if not left.empty:
start = left["End"].max()
if not right.empty:
end = right["Start"].min()
if not overlaps.empty:
# if there are any records which overlaps the current
# period full, the current period will not be considered
return 0
if not within.empty:
# if there are other records within the current period,
# these months need to be removed since overlapping
duration = df.apply(
lambda row: get_full_months(row["Start"], row["End"]),
axis=1
)
return get_full_months(start, end) - duration.sum()
return get_full_months(start, end)
df["Duration"] = df.apply(
lambda row: evaluate_duration(row["Start"], row["End"]),
axis=1
)
Output for print(df)
would be:
Company Start End Duration
0 A 2021-01-01 2021-03-01 3
1 B 2021-03-03 2021-06-07 2
2 C 2021-06-10 2021-08-28 0
3 D 2021-04-10 2021-10-02 2
Explanation:
Considered Not Considered Why
A: Jan to Mrz
B: Mrz, Apr May, Jun Overlapped on the right with D
NB: Mrz and Apr are not (fully) overlapped
C: Jun to Aug Fully overlapped by D
D: Sep, Oct Apr to Aug Overlapped on the left with B, overlaps C
You need to consider 4 different cases of overlapping:
Left Right
|-------| |------| df["Start"] and df["End"]
|***ooooo| |oooooo***| start and end
^ ^
left["End"].max() right["Start"].min()
Overlaps Within
|----------| |------| df["Start"] and df["End"]
|*****| |oo******oooooo| start and end
Legend
|----| employment period
oooo considered duration
*** ignored date range within employment period
In one of your comments you wrote:
[...] excluding overlappings, meaning duration should be counted only once. [...]
If you need overlapping time at least once being considered, change the function evaluate_duration
to:
def evaluate_duration(start, end):
left = df[(df["End"] > start) & (df["Start"] < start)]
overlaps = df[(df["Start"] < start) & (df["End"] > end)]
if not left.empty:
start = left["End"].max()
if not overlaps.empty:
return 0
return get_full_months(start, end)
Output for print(df)
would be:
Company Start End Duration
0 A 2021-01-01 2021-03-01 3
1 B 2021-03-03 2021-06-07 4
2 C 2021-06-10 2021-08-28 0
3 D 2021-04-10 2021-10-02 5
Explanation:
Considered Not Considered Why
A: Jan to Mrz
B: Mrz to Jun Note: Mrz does not overlap
C: Jun to Aug Fully overlapped by D, therefore ignored
D: Jun to Oct Apr, May Jun only partially overlaps with B
Another thing is that in the each month he worked, it counts as one month, no matter how long he worked [...]
You can achieve this with:
def get_full_months(start, end):
""" Returns the number of months including
non-full months. """
if end.year == start.year:
return end.month - start.month + 1
months = (end.year - start.year - 1) * 12
return (12 - start.month + 1) + end.month + months
Please note: The code above may not be fully tested, especially not with edge cases and other data than the provided example data.