1

I have a list of working business dates that look like this:

list_of_dates = ["15/02/2021", "16/02/2021", "17/02/2021",
                 "19/02/2021", "22/02/2021"," 23/02/2021"]

I would like to transform it into a list of continued periods:

list_of_periods = [[start_date="15/02/2021", end_date="17/02/2021"], 
                   [start_date="19/02/2021", end_date="23/02/2021"]

So whenever there's one or more business days between two dates, the prior period ends and a new period starts.

I thought about playing around calculating the differences between dates. What's the best way to go with this?

lostAtom
  • 48
  • 8
  • How do you know that the end of the first period is `"17/02/2021"` and not `"19/02/2021"` or even the last date of your list ? – jusstol Mar 02 '21 at 14:59
  • @jusstol, he explains in the post: "So whenever there's a day between or more between two dates, the prior period ends and a new period starts." – Loic RW Mar 02 '21 at 15:01
  • @jusstol There is a day between `"17/02/2021"` and `"19/02/2021"` and it is not in the list, hence the period ends `"17/02/2021"` – Kraigolas Mar 02 '21 at 15:06
  • @LoicRW My bad, I read that wrong. – jusstol Mar 02 '21 at 15:06

3 Answers3

1

To determine spans of continuous business days, you'll need to add in the weekends (and possibly holidays with the holidays= argument) that way you can check if there's 1 day between all of your dates to form your groups.

First turn your list into a Series of datetime64 values and concatenate all the weekends that are within your data. Sort and drop duplicates. Then find where the difference is !=1 to create your groups and you want the min and max to get the start and end.

import pandas as pd
list_of_dates = ["15/02/2021", "16/02/2021", "17/02/2021",
                 "19/02/2021", "22/02/2021", "23/02/2021"]

s = pd.Series(pd.to_datetime(list_of_dates, format='%d/%m/%Y'))
sweekend = pd.Series(pd.bdate_range(s.min(), end=s.max(), freq='C', 
                                    weekmask='Sat Sun', holidays=None))

s = pd.concat([s, sweekend]).drop_duplicates().sort_values()

gps = s.diff().dt.days.ne(1).cumsum()
s.to_frame().groupby(gps).agg(start_date=(0, min), end_date=(0, max))

  start_date   end_date
1 2021-02-15 2021-02-17
2 2021-02-19 2021-02-23
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • Hi ALollz, Thanks for the perfect response that also handles the non business days issue! – lostAtom Mar 02 '21 at 16:01
  • Hey ALollz, for a weird reason i couldn't figure out, your code doesn't work with this list: list_of_dates = ["20/11/2020","23/11/2020","01/12/2020","02/12/2020","03/12/2020"] It's exactly the same list format, i don't understand where's the issue. – lostAtom Mar 02 '21 at 17:51
  • @lostAtom that's because it gets confused with the format as days/months can get amibguous. Wasn't an issue in your first case since all days were > 12. You can change first line to `s = pd.Series(pd.to_datetime(list_of_dates, format='%d/%m/%Y'))` and it will know which value to infer as the day and which to use as the month. I updated the solution to reflect that. – ALollz Mar 02 '21 at 17:53
0

Assuming your dates are datetime.date you can substract them and create a timedelta object which represents the number of days between 2 dates:

from datetime import date

d0 = date(2008, 8, 18)
d1 = date(2008, 9, 26)
delta = d1 - d0
if delta.days >= 1:
  # make time period

See https://docs.python.org/library/datetime.html.

This does not take business days into a count.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0

This seems to be what you are looking for: Find group of consecutive dates in Pandas DataFrame

If single dates can be their own period (e.g. '19/02/2021' in your example), then use:

import numpy as np
import pandas as pd

list_of_dates = ["15/02/2021","16/02/2021","17/02/2021","19/02/2021","22/02/2021","23/02/2021"]
df = pd.DataFrame({'dates': list_of_dates})
df.dates = pd.to_datetime(df.dates)
day = pd.Timedelta('1d')

breaks = df.dates.diff() != day
groups = breaks.cumsum()
[{'start_date': df.dates[groups==i].iloc[0], 'end_date': df.dates[groups==i].iloc[-1]} 
for i in groups.unique()]
Loic RW
  • 444
  • 3
  • 7
  • Thanks Loic! But 19/02/2021 is not considered a single date, since it is followed by 22/02/2021 (business days). Allolz solution handles weekends – lostAtom Mar 02 '21 at 16:10