How to add a column in a dataset based on a certain criteria

Question

I have the need to do the following: I have a dataset containing the time at which a certain specific vehicle passes at a specific point. I need to insert a column indicating how many times each specific vehicle passes there. Moreover, I need to reset the count each time the delta time between two subsequent passes of the same vehicle is over a certain threshold.

For example:

Vehicle || Time || number times passed
A         00:15      1
B         00:20      1
C         00:25      1
C         00:45      2
A         00:59      2
A         01:56      3
B         22:55      1   (delta time above the threshold, so reset)
A         23:49      1   (delta time above the threshold, so reset)

df['period']=pd.to_datetime(df['date_time'])
dfM['Number'] = df.groupby(['Vehicle']).cumcount().add(1)

I think this just summes up the times without considering the reset above a certain threshold, for which I have absolutely no idea how to do it.

maximdu · Accepted Answer · 2023-05-13T13:44:08.343

My first idea is to simply split df into parts and then compute the result for each part separately

This is not perfect, but looks like it works:

# add "epoch" for calculations
# for each epoch we will compute result separately
# epoch = how many timediffs were more than thresholds (so far)
df['epoch'] = (
    pd.to_datetime(df['Time']).diff() > \
    pd.Timedelta('01:00:00')  # your threshold
).cumsum()

# from your code
def get_cumcount(df):
    return df.groupby('Vehicle').cumcount().add(1).values

# for each epoch:
# compute result separately
df.loc[:, 'result'] = None
for i in df['epoch'].unique():
    cumcount = get_cumcount(df[df['epoch'] == i])
    df.loc[df['epoch'] == i, 'result'] = cumcount

I also tried doing it using groupby and transform, but got errors

How to add a column in a dataset based on a certain criteria

1 Answers1