-1

Let's say we have a list and we want to deduplicate concurrent identical elements

['a', 'b', 'b', 'a', 'b']

would become

['a', 'b', 'a', 'b']

or another

['a', 'b', 'c', 'c', 'a', 'b', 'b']

would become

['a', 'b', 'c', 'a', 'b']

I would like to do this as efficiently as possible.

My solution seems cumbersome. Using an enumerate loop and adding index locations to be removed, then looping again removing all elements at index locations once the loop is executed.

Ideally I'd like to avoid looping entirely as in production I'll be iterating over a very very long list of lists with many elements.

eg = ['a', 'b', 'c', 'c', 'a', 'b', 'b']

remove = []

for x in enumerate(eg[:-1]):
        if x[1] == eg[x[0]+1]:
            remove.append(x[0])

for index in sorted(remove, reverse=True):
    del eg[index]

eg

['a', 'b', 'c', 'a', 'b']
Clem Manger
  • 173
  • 1
  • 12

1 Answers1

2

Use itertools.groupby, and take only the keys from the iterator in a comprehension:

>>> from itertools import groupby
>>> l = ['a', 'b', 'c', 'c', 'a', 'b', 'b']
>>> [k for k, _ in groupby(l)]
['a', 'b', 'c', 'a', 'b']
Netwave
  • 40,134
  • 6
  • 50
  • 93