I have a dataframe with data from an equipment. Sometimes the start of the equipment fail and will generate a line of data that the action was done. Then the equipment will try to start again a few seconds later and in most cases within 2 or 3 trials it succeeds.
The problem is that the retries and successes all goes to the same table with no distinction between false starts and real starts. As the equipment starts only once every few hours, all I have to do is find all the rows that have a similar timestamp (inside an interval of 2 minutes for example) and keep only the last one.
The task is to eliminate those "false starts" from the dataframe.
The dataframe is ordered by those timestamps so the index of those "false starts" will be a sequence. It can be done for one equipment by iterating and eliminating if:
df.timestamp_local.iloc[i]-df.timestamp_local.iloc[i-1] =< 'some timedelta'
But is impractical to do when running over thousands of equipments.
Input example of the dataframe where the last 3 ones are one case with only the last row as a "real start":
device_name timestamp_local tk_event_desc
0 A005 2019-08-29 19:14:57 Start
1 A005 2019-09-03 09:11:37 Start
2 A005 2019-09-06 14:06:30 Start
3 A005 2019-09-09 17:39:17 Start
4 A005 2019-09-12 10:43:33 Start
5 A005 2019-09-12 17:07:08 Start
6 A005 2019-09-13 01:18:36 Start
7 A005 2019-09-13 13:20:40 Start
8 A005 2019-09-17 17:54:44 Start
9 A005 2019-09-21 12:29:47 Start
10 A005 2019-09-22 11:58:26 Start
11 A005 2019-09-22 11:58:27 Start
12 A005 2019-09-22 11:58:29 Start