21

Is there a better way than bdate_range() to measure business days between two columns of dates via pandas?

df = pd.DataFrame({ 'A' : ['1/1/2013', '2/2/2013', '3/3/2013'],
 'B': ['1/12/2013', '4/4/2013', '3/3/2013']})
print df
df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])
f = lambda x: len(pd.bdate_range(x['A'], x['B']))
df['DIFF'] = df.apply(f, axis=1)
print df

With output of:

          A          B
0  1/1/2013  1/12/2013
1  2/2/2013   4/4/2013
2  3/3/2013   3/3/2013
                    A                   B  DIFF
0 2013-01-01 00:00:00 2013-01-12 00:00:00     9
1 2013-02-02 00:00:00 2013-04-04 00:00:00    44
2 2013-03-03 00:00:00 2013-03-03 00:00:00     0

Thanks!

JJJ
  • 1,009
  • 6
  • 19
  • 31
brian_the_bungler
  • 991
  • 2
  • 7
  • 12
  • Possible duplicate: http://stackoverflow.com/questions/13019719/get-business-days-between-start-and-end-date-using-pandas (See the bottom for counting the days, as opposed to listing them.) – Dan Allan Jul 17 '13 at 15:37
  • @DanAllan it feels like that's doing the same though, in the sense that you are creating an intermediary date range... however it may not be possible without doing that. :s – Andy Hayden Jul 17 '13 at 16:51
  • Yes, definitely. If you are studying long ranges, and you only need to exclude weekends (but not holidays), I expect it would be faster to construct a custom function that uses the days of the week for A and B and their total difference in time to compute the number of business days. – Dan Allan Jul 17 '13 at 17:11
  • 2
    When working on the above, I looked into numpy busday_count but there was issues with casting from metadata [us] to [D]. `f = lambda x: np.busday_count(x['A'], x['B'])`. – brian_the_bungler Jul 17 '13 at 17:25

2 Answers2

21

brian_the_bungler was onto the most efficient way of doing this using numpy's busday_count:

import numpy as np
A = [d.date() for d in df['A']]
B = [d.date() for d in df['B']]
df['DIFF'] = np.busday_count(A, B)
print df

On my machine this is 300x faster on your test case, and 1000s of times faster on much larger arrays of dates

Antonbass
  • 319
  • 2
  • 5
  • 3
    An alternative and more concise syntax to [Antonbass][1] is this: ```df['DIFF'] = np.busday_count(df['A'].tolist(), df['B'].tolist()) ``` or ```df['DIFF'] = np.busday_count(df.index.date.tolist(), df['B'].tolist()) ``` if using the index as a date column. [1]: https://stackoverflow.com/users/2834466/antonbass – tsando Feb 02 '18 at 17:19
  • 1
    Note, that this solution does not give identical result to the one in the question. The result here will be [9, 43, 0] instead of [9, 44, 0] – Vladimir Nov 25 '19 at 16:14
0

You can use pandas' Bday offset to step through business days between two dates like this:

new_column = some_date - pd.tseries.offsets.Bday(15)

Read more in this conversation: https://stackoverflow.com/a/44288696

It also works if some_date is a single date value, not a series.

Tambe Tabitha
  • 119
  • 2
  • 4