I am trying to add a column to my data frame in pandas where each entry represents the difference between another column's values across two adjacent rows (if certain conditions are met). Following this answer to get previous row's value and calculate new column pandas python I'm using shift
to find the delta between the duration_seconds
column entries in the two rows (next minus current) and then return that delta as the derived entry if both rows are from the same user_id
, the next row's action
is not 'login', and the delta is not negative. Here's the code:
def duration (row):
candidate_duration = row['duration_seconds'].shift(-1) - row['duration_seconds']
if row['user_id'] == row['user_id'].shift(-1) and row['action'].shift(-1) != 'login' and candidate_duration >= 0:
return candidate_duration
else:
return np.nan
Then I test the function using
analytic_events.apply(lambda row: duration(row), axis = 1)
But that throws an error:
AttributeError: ("'int' object has no attribute 'shift'", 'occurred at index 9464384')
I wondered if this was akin to the error fixed here and so I tried passing in the whole data frame thus:
duration(analytic_events)
but that throws the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What should I be doing to achieve this combination; how should I be using shift
?