0

I have three columns with dates, a Begin column and two End columns. I want to use len of bdate_range to find the number of business days between Begin and End. However, using NaT as an argument in bdate_range causes it to throw an error.

Setup

test = pd.DataFrame({'Begin': ['2014-06-11', '2014-08-05', '2014-09-21', '2014-09-21', '2014-09-21', '-'],
                   'End1': ['2014-06-12', '2014-08-31', 'NaT', '2014-09-30', '2014-09-28', '-'],
                    'End2': ['2014-06-14', '-', '2014-09-30', '-', '2014-09-28', '2014-12-15']})

test['Begin'] = pd.to_datetime(test['Begin'], dayfirst=False, yearfirst=False, errors='coerce', exact=False)
test['End1'] = pd.to_datetime(test['End1'], dayfirst=False, yearfirst=False, errors='coerce', exact=False)
test['End2'] = pd.to_datetime(test['End2'], dayfirst=False, yearfirst=False, errors='coerce', exact=False)

Creates this table:

    Begin       End1        End2
0   2014-06-11  2014-06-12  2014-06-14
1   2014-08-05  2014-08-31  NaT
2   2014-09-21  NaT         2014-09-30
3   2014-09-21  2014-09-30  NaT
4   2014-09-21  2014-09-28  2014-09-28
5   NaT         NaT         2014-12-15

My Attempt

So, in these dataframe I tryed the bdate_range with len

test['bdate_range'] = [len(pd.bdate_range(x,y))for x,y in zip(test['Begin'],test['End1'])]
.
.
.
ValueError: Neither `start` nor `end` can be NaT

fillna not so good...

test = test.fillna(pd.Timedelta(0))
test['bdate_range'] = [len(pd.bdate_range(x,y))for x,y in zip(test['Begin'],test['End1'])]
.
.
.
TypeError: Cannot convert input [0 days 00:00:00] of type <class 'pandas._libs.tslibs.timedeltas.Timedelta'> to Timestamp

Desired Outcome

The ideal outcome is two more columns, in every rows two more fields:

  1. Begin - End1 differences in workdays, If any value NaT result of the columns will be NaN/NaT
  2. Begin - End2 differences in workdays, if any value is NaT result of these is NaT/NaN –
    Begin       End1        End2       Begin-End1_bdate Begin-End2_bdate
0   2014-06-11  2014-06-12  2014-06-14 1                2
1   2014-08-05  2014-08-31  NaT        18               NaN
2   2014-09-21  NaT         2014-09-30 NaN              7
3   2014-09-21  2014-09-30  NaT        7                NaN
4   2014-09-21  2014-09-28  2014-09-28 5                5
5   NaT         NaT         2014-12-15 NaN              NaN

Any, idea?

Community
  • 1
  • 1
  • Is it fair to assume you should use `End2` when `End1` is missing but `End2` is not null? Also what should the answer be when either Begin is missing or both Ends are missing? – ALollz May 02 '19 at 17:01
  • Can you please show your desired outcome for `bdate_range`? – Cilantro Ditrek May 02 '19 at 17:12
  • Two more columns, in every rows two more fields: 1. Begin - End1 differences in workdays, If any value NaT result of the colums will be NaN/NaT 2. Begin - End2 differences in workdays, if any value is NaT resul of these is NaT/NaN – Attila Balázs May 02 '19 at 19:53
  • I update my question. A table is there on the end. – Attila Balázs May 03 '19 at 07:04

1 Answers1

2

You need to apply the function (in this case, the len of bdate_range) to only the rows where neither the Begin nor End fields are null. You can set the default range field to null and then slice to just those specific rows using .loc. Please read this longer post on that function.

### YOUR SETUP CODE
test = pd.DataFrame({'Begin': ['2014-06-11', '2014-08-05', '2014-09-21', '2014-09-21', '2014-09-21', '-'],
           'End1': ['2014-06-12', '2014-08-31', 'NaT', '2014-09-30', '2014-09-28', '-'],
            'End2': ['2014-06-14', '-', '2014-09-30', '-', '2014-09-28', '2014-12-15']})

test['Begin'] = pd.to_datetime(test['Begin'], dayfirst=False, yearfirst=False, errors='coerce', exact=False)
test['End1'] = pd.to_datetime(test['End1'], dayfirst=False, yearfirst=False, errors='coerce', exact=False)
test['End2'] = pd.to_datetime(test['End2'], dayfirst=False, yearfirst=False, errors='coerce', exact=False)

## DEFAULT RANGE FIELDS TO NULL
test['Begin-End1_bdate'] = np.nan
test['Begin-End2_bdate'] = np.nan

### USE LOC TO FIND NON-NULL ROWS
test.loc[(test['Begin'].notnull()) & (test['End1'].notnull()), 'Begin-End1_bdate'] = test[(test['Begin'].notnull()) & (test['End1'].notnull())].apply(lambda row: len(pd.bdate_range(row['Begin'], row['End1'])), axis = 1)
test.loc[(test['Begin'].notnull()) & (test['End2'].notnull()), 'Begin-End2_bdate'] = test[(test['Begin'].notnull()) & (test['End2'].notnull())].apply(lambda row: len(pd.bdate_range(row['Begin'], row['End2'])), axis = 1)

test
Cilantro Ditrek
  • 1,047
  • 1
  • 14
  • 26
  • Almost. :-) I want `NaN` instead of `today` date. – Attila Balázs May 02 '19 at 20:44
  • Not really. This is the normal days between the two dates. I want only the workdays. See my table on the end of my question. – Attila Balázs May 03 '19 at 07:07
  • I've updated my answer. It does not replicate your desired outcome. However, I think it addresses your fundamental question of using `bdate_range` in the case of having `NaT`. The question of using `bdate_range` to get a different duration to match your desired outcome is likely a separate question. – Cilantro Ditrek May 03 '19 at 14:34