Pandas Date Conditional Calculation

Question

I am trying to create a column in Pandas based off of a conditional statement that calculates time between two events. I was able to work out the day calculation but when plugged into my conditional statement:

def defect_age(df):
    if df['Status'] == 'R':
        return (pd.to_datetime(df['resolved_on'], errors='coerce') 
            - pd.to_datetime(df['submitted_on'])) / np.timedelta64(1, 'D')
    else:
        return 'null'

And then later called by the column:

group_df['Age'] = group_df.apply(defect_age(group_df), axis=0)

I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried to base mine on the question asked HERE... But I am not having much success. Any help is appreciated!

score 2 · Accepted Answer · answered Aug 03 '16 at 20:54

Try using this definition of defect_age

def defect_age(df):
    resolved = pd.to_datetime(df.resolved_on, errors='coerce')
    submitted = pd.to_datetime(df.submitted_on)
    r = (resolved - submitted) / np.timedelta64(1, 'D')
    return np.where(df.Status == 'R', r, np.nan)

The error was coming from if df['Status'] == 'R'

This would have been a series of boolean values and not a single boolean value that if needs. You still want to run this over the whole series at once. I hope I've given you something that does the trick.

Fantastic! Thanks. Your answer is exactly what I was trying to figure out. I've got a bunch more statuses that I'm gonna add now :) — anshanno, Aug 04 '16 at 11:30

score 1 · Answer 2 · answered Aug 03 '16 at 20:50

Do it like this :

group_df['Age'] = group_df.apply(lambda row:defect_age(row), axis=1)

This is because you want to apply the function to each row not to the whole dataframe at once.

df['Status'] == 'R' will give a list of booleans if applied on a dataframe and u cant put a list of booleans in an if expression

Pandas Date Conditional Calculation

2 Answers2