Removing neighbouring duplicates have been discussed before, but only in terms of direct neighbouring (one row above/below) here.
I have the following dataframe:
df = pd.DataFrame(data={"item_no": [11, 4, 4, 4, 7, 8, 7, 11, 11, 5, 5, 6, 4], "time": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]})
df
:
item_no time
0 11 1
1 4 2
2 4 3
3 4 4
4 7 5
5 8 6
6 7 7
7 11 8
8 11 9
9 5 10
10 5 11
11 6 12
12 4 13
where it is sorted by the time
column (imagine it as a time-series). I need to remove the neighboring duplicates in the item_no
columns, keeping only the first entry.
Expected output:
item_no time
0 11 1
1 4 2
2 7 5
3 8 6
4 7 7
5 11 8
6 5 10
7 6 12
8 4 13
As can be seen, an arbitrary number of neighboring duplicates should able to be removed. I know I can iterate row by row, and check if the previous item_no
is the same. but I am looking for an efficient solution, since this will be applied to millions of rows.