1

My goal is to calculate how many full business days there are between two timestamps.

A "business day" in this context is a full day (24hrs) and days can be split (Ex: Friday at noon -> Monday at noon = 1 full business day)

This cannot be solved with numpy because business days must exclude Saturday, Sunday, and Holidays. The real world problem uses a custom holiday calendar, but the snippet below is using the built-in USFederalHolidayCalendar for simplicity. This is why it needs to be solved with Pandas as far as I know.

Through examples online, experimentation, and reading the docs, I have this snippet:

import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
from datetime import datetime

cal = CustomBusinessDay(calendar=USFederalHolidayCalendar())

print(len(pd.date_range(
     start=datetime.fromisoformat('2022-10-04T00:00:00'),
     end=datetime.fromisoformat('2022-10-04T00:00:01'),
     freq=cal)))

The result shows: 1

That's inaccurate in terms of the problem I need to solve. The range is actually just 1 second, which is far from a business day.

Here are values that don't return how I would expect:

  • '2022-10-04T00:00:00' -> '2022-10-04T00:00:01' - Expected: 0 (Actual: 1)
  • '2022-10-04T00:00:00' -> '2022-10-04T23:59:59' - Expected: 0 (Actual: 1)
  • '2022-10-03T23:59:59' -> '2022-10-05T00:00:00' - Expected: 1 (Actual: 2)
  • '2022-09-30T12:00:00' -> '2022-10-03T12:00:00' (Friday->Monday) - Expected: 1 (Actual: 2)

How can I get the expected results using pandas?

Side Note: I was trying to make it work by first calculating with freq='1H' and then eliminating "business days" from that result and I couldn't get anywhere with it.

Rino Bino
  • 366
  • 3
  • 15
  • Wait, does the specification actually care about the time of day? For example, if the first timestamp is 1PM on a non-holiday weekday, and the second timestamp is on the next weekday (also not a holiday), *do we care whether* the second timestamp is before or after 1PM? – Karl Knechtel Oct 11 '22 at 00:05
  • @KarlKnechtel Yes, it must be a full 24hr period. The time of day absolutely matters. Thanks – Rino Bino Oct 11 '22 at 00:06

2 Answers2

0
import pandas as pd
from datetime import datetime
from pandas.tseries.offsets import BDay    

start=datetime.fromisoformat('2022-10-03T23:59:59') + BDay(1)
end=datetime.fromisoformat('2022-10-05T00:00:00') + BDay(1)
days = (end-start).days
print(days.days)
mike01010
  • 5,226
  • 6
  • 44
  • 77
0

Phew, lots of trial and error, but I figured it out!

Quick Answer + Snippet:

The trick is that you can't use business "day" logic, you must use business "hours".

The Pandas bday logic has no concept of partial days. For this, we use CustomBusinessHour and calculate the days based on how many hours passed.

Note that this still isn't 100% accurate! See the explanation and full snippet in the next section of this answer to get full accuracy.

# Create a custom business hour schedule
us_bh = CustomBusinessHour(start='00:00', end='23:59', calendar=USFederalHolidayCalendar())

# Count the hours calculated between the two times
hours = len(pd.date_range(
    start=datetime.fromisoformat('2022-10-04T00:00:00'),
    end=datetime.fromisoformat('2022-10-04T00:00:01'),
    freq=us_bh))

# Calculate number of days
bd = floor(hours / 24)

A Caveat to This Solution:

When defined with an end of 23:59, CustomBusinessHour will count anything within the 23rd hour as a full hour (I don't know why, but it does).

To get around this, I checked if the end timestamp is within 23:00-23:59 and subtracted an hour if the start time also isn't within the 23rd hour:

us_bh = CustomBusinessHour(start='00:00', end='23:59', calendar=USFederalHolidayCalendar())

start = datetime.fromisoformat('2022-10-04T00:00:00')
end = datetime.fromisoformat('2022-10-04T23:59:59')

hours = len(pd.date_range(
    start=start,
    end=end,
    freq=us_bh))

# Fix for caveat here:
if end.hour == 23 and start.hour != 23:
    hours = hours - 1

Full Working Snippet With Test Cases:

#!/usr/bin/env python3

from datetime import datetime
from math import floor
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessHour

test_cases = [
    {
      "start": '2022-10-04T00:00:00',
      "end": '2022-10-04T00:00:01',
      "expect": 0
    },
    {
      "start": '2022-10-04T00:00:00',
      "end": '2022-10-04T22:59:59',
      "expect": 0
    },
    {
      "start": '2022-10-04T00:00:00',
      "end": '2022-10-04T23:00:00',
      "expect": 0
    },
    {
      "start": '2022-10-04T00:00:00',
      "end": '2022-10-05T23:00:00',
      "expect": 1
    },
    {
      "start": '2022-10-03T23:59:59',
      "end": '2022-10-05T00:00:00',
      "expect": 1
    },
    {
      # Regular weekend
      "start": '2022-09-30T12:00:00',
      "end": '2022-10-03T12:00:00',
      "expect": 1
    },
    {
      # Regular weekend + a few minutes
      "start": '2022-09-30T12:00:00',
      "end": '2022-10-03T14:14:00',
      "expect": 1
    },
    {
      # Regular weekend
      "start": '2022-09-30T12:00:00',
      "end": '2022-10-04T12:00:00',
      "expect": 2
    },
    {
      # Christmas weekend
      "start": '2022-12-23T12:00:00',
      "end": '2022-12-26T13:00:00',
      "expect": 0
    },
    {
      # Christmas weekend + 1
      "start": '2022-12-23T12:00:00',
      "end": '2022-12-27T12:00:00',
      "expect": 1
    }
]

# Custom hour schedules
us_bh = CustomBusinessHour(start='00:00', end='23:59', calendar=USFederalHolidayCalendar())

for test_case in test_cases:
    start = datetime.fromisoformat(test_case['start'])
    end = datetime.fromisoformat(test_case['end'])
    expect = test_case['expect']

    # Use Pandas to calculate the number of hours using the us_bh frequency
    hours = len(pd.date_range(
        start=start,
        end=end,
        freq=us_bh))

    # CustomBusinessHour does not calculate properly if the end time is between 23:00 - 23:59
    if end.hour == 23 and start.hour != 23:
        hours = hours - 1

    # Calculate days
    bd = floor(hours / 24)

    # Debug output
    print("\nPASS" if bd == expect else "\n\n\nFAIL")
    print(f'Between {start} -> {end} it\'s been {bd} full business day(s)')

Output

PASS
Between 2022-10-04 00:00:00 -> 2022-10-04 00:00:01 it's been 0 full business day(s)

PASS
Between 2022-10-04 00:00:00 -> 2022-10-04 22:59:59 it's been 0 full business day(s)

PASS
Between 2022-10-04 00:00:00 -> 2022-10-04 23:00:00 it's been 0 full business day(s)

PASS
Between 2022-10-04 00:00:00 -> 2022-10-05 23:00:00 it's been 1 full business day(s)

PASS
Between 2022-10-03 23:59:59 -> 2022-10-05 00:00:00 it's been 1 full business day(s)

PASS
Between 2022-09-30 12:00:00 -> 2022-10-03 12:00:00 it's been 1 full business day(s)

PASS
Between 2022-09-30 12:00:00 -> 2022-10-03 14:14:00 it's been 1 full business day(s)

PASS
Between 2022-09-30 12:00:00 -> 2022-10-04 12:00:00 it's been 2 full business day(s)

PASS
Between 2022-11-23 12:00:00 -> 2022-11-28 12:00:00 it's been 1 full business day(s)


Rino Bino
  • 366
  • 3
  • 15