3

Given a Pandas Series (or numpy array) like this:

import pandas as pd
myseries = pd.Series([1, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 2, 2, 3, 3, 1])

Is there a good way to remove sequential duplicates, much like the unix uniq tool does? The numpy/pandas unique() and pandas drop_duplicates functions remove all duplicates (like unix's | sort | uniq), but I don't want this:

>>> print(myseries.unique())
[1 2 3 4]

I want this:

>>> print(myseries.my_mystery_function())
[1, 2, 3, 4, 3, 2, 3, 1]
DrAl
  • 70,428
  • 10
  • 106
  • 108

3 Answers3

7

Compare by ne (!=) with shifted Series and filter by boolean indexing:

myseries = myseries[myseries.ne(myseries.shift())].tolist()
print (myseries)
[1, 2, 3, 4, 3, 2, 3, 1]

If performance is important, use Divakar solution.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
4

We can use slicing -

In [62]: a = myseries.values

In [63]: a[np.r_[True,a[:-1]!= a[1:]]]
Out[63]: array([1, 2, 3, 4, 3, 2, 3, 1])
Divakar
  • 218,885
  • 19
  • 262
  • 358
0

A version of jezrael's using !=:

print(myseries[myseries!=myseries.shift()].tolist())

Output:

[1, 2, 3, 4, 3, 2, 3, 1]
U13-Forward
  • 69,221
  • 14
  • 89
  • 114