NumPy/Pandas: remove sequential duplicate values (equivalent of bash uniq without sort)

Question

Given a Pandas Series (or numpy array) like this:

import pandas as pd
myseries = pd.Series([1, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 2, 2, 3, 3, 1])

Is there a good way to remove sequential duplicates, much like the unix uniq tool does? The numpy/pandas unique() and pandas drop_duplicates functions remove all duplicates (like unix's | sort | uniq), but I don't want this:

>>> print(myseries.unique())
[1 2 3 4]

I want this:

>>> print(myseries.my_mystery_function())
[1, 2, 3, 4, 3, 2, 3, 1]

score 7 · Accepted Answer · answered Dec 04 '18 at 08:24

7

Compare by ne (!=) with shifted Series and filter by boolean indexing:

myseries = myseries[myseries.ne(myseries.shift())].tolist()
print (myseries)
[1, 2, 3, 4, 3, 2, 3, 1]

If performance is important, use Divakar solution.

answered Dec 04 '18 at 08:24

jezrael

822,522
95
1,334
1,252

Also may have trouble if `np.nan` is present. – hilberts_drinking_problem Dec 04 '18 at 08:32

score 4 · Answer 2 · answered Dec 04 '18 at 08:24

4

We can use slicing -

In [62]: a = myseries.values

In [63]: a[np.r_[True,a[:-1]!= a[1:]]]
Out[63]: array([1, 2, 3, 4, 3, 2, 3, 1])

answered Dec 04 '18 at 08:24

Divakar

218,885
19
262
358

Might have trouble if `np.nan` is present. – hilberts_drinking_problem Dec 04 '18 at 08:31

score 0 · Answer 3 · answered Dec 04 '18 at 08:30

0

A version of jezrael's using !=:

print(myseries[myseries!=myseries.shift()].tolist())

Output:

[1, 2, 3, 4, 3, 2, 3, 1]

answered Dec 04 '18 at 08:30

U13-Forward

69,221
14
89
114

1

An operator is not a good enough excuse to copy 99% of another answer. – cs95 Dec 04 '18 at 10:14
@coldspeed Oh Okay, let me delete this, okay... – U13-Forward Dec 04 '18 at 23:33
1

It's up to you! I commented in good faith. – cs95 Dec 04 '18 at 23:49
@coldspeed Thank you so much, i think i should keep... – U13-Forward Dec 04 '18 at 23:51

NumPy/Pandas: remove sequential duplicate values (equivalent of bash uniq without sort)

3 Answers3