2

I am trying to create a column in Pandas based off of a conditional statement that calculates time between two events. I was able to work out the day calculation but when plugged into my conditional statement:

def defect_age(df):
    if df['Status'] == 'R':
        return (pd.to_datetime(df['resolved_on'], errors='coerce') 
            - pd.to_datetime(df['submitted_on'])) / np.timedelta64(1, 'D')
    else:
        return 'null'

And then later called by the column:

group_df['Age'] = group_df.apply(defect_age(group_df), axis=0)

I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried to base mine on the question asked HERE... But I am not having much success. Any help is appreciated!

Bill Armstrong
  • 1,615
  • 3
  • 23
  • 47
anshanno
  • 344
  • 4
  • 21

2 Answers2

2

Try using this definition of defect_age

def defect_age(df):
    resolved = pd.to_datetime(df.resolved_on, errors='coerce')
    submitted = pd.to_datetime(df.submitted_on)
    r = (resolved - submitted) / np.timedelta64(1, 'D')
    return np.where(df.Status == 'R', r, np.nan)

The error was coming from if df['Status'] == 'R'

This would have been a series of boolean values and not a single boolean value that if needs. You still want to run this over the whole series at once. I hope I've given you something that does the trick.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Fantastic! Thanks. Your answer is exactly what I was trying to figure out. I've got a bunch more statuses that I'm gonna add now :) – anshanno Aug 04 '16 at 11:30
1

Do it like this :

group_df['Age'] = group_df.apply(lambda row:defect_age(row), axis=1)

This is because you want to apply the function to each row not to the whole dataframe at once.

df['Status'] == 'R' will give a list of booleans if applied on a dataframe and u cant put a list of booleans in an if expression

Gaurav Dhama
  • 1,346
  • 8
  • 19