How to remove adjacent duplicate elements in a list using list comprehensions?

Question

Is there a way to use list comprehensions in python to filter adjacent duplicates from a list?

Here's an example of what I mean:

>>> xs = [1,2,2,3]
>>> print added.reAdj(xs)
[1,2,3]

A search through SE revealed an earlier inquiry asking a similar but slightly different question: whether all duplicates could be removed from a list, but not explicitly asking for solutions involving list comprehensions. The motivation for using list comprehensions specifically follows a recognition of their advantages over traditional for loops. Users suggested the use of the set() function or standard looping as such:

result = []
most_recent_elem = None
for e in xs:
    if e != most_recent_elem:
        result.append(e)
        most_recent_elem = e

The set() suggestion fails to meet the task in that non-adjacent duplicates are removed, while the loop is effective but verbose.

It seems a means for safely referencing the next element in a list comprehension as follows is needed.

[x for x in xs if x != **x.next()**]

Any ideas?

Does this mean even `[1,2,2,2,2,3,3,3,4,5,6,6] --> [1,2,3,4,5,6]` ? — Iron Fist, Jan 25 '16 at 06:05
@IronFist, you understand correctly. That's the desired result. — David Shaked, Jan 25 '16 at 06:10
@Achampion. Thanks for the link, but as far as I can tell, the poster doesn't ask specifically for an answer involving list comprehensions. My original question post included a loop solution. — David Shaked, Jan 25 '16 at 07:35

falsetru · Answer 1 · 2016-01-25T07:04:03.983

You can use itertools.groupby:

>>> import itertools
>>> [key for key, grp in itertools.groupby([1, 2, 2, 3])]
[1, 2, 3]

itertools.groupby returns an iterator. By iterating it, you will get a key, group pairs. (key will be a item if no key function is specified, otherwise the return value of the key function). group is an iterator which will yields items grouped by applying key function (if not specified, same values will be grouped)

>>> import itertools
>>> it = itertools.groupby([1, 2, 2, 3])
>>> it
<itertools.groupby object at 0x7feec0863048>
>>> for key, grp in it:
...     print(key)
...     print(grp)
... 
1
<itertools._grouper object at 0x7feec0828ac8>
2
<itertools._grouper object at 0x7feec0828b00>
3
<itertools._grouper object at 0x7feec0828ac8>
>>> it = itertools.groupby([1, 2, 2, 3])
>>> for key, grp in it:
...     print(list(grp))
... 
[1]
[2, 2]
[3]

Above solution, I used only key because the question does not care how many items are adjacent.

It's so strange to see `groupby` used without a sorted input, but I guess this is actually appropriate in this particular use case. — wim, Jan 25 '16 at 17:44
@wim the desire to be able to do this sort of thing is part of why `groupby` is designed the way it is. — Karl Knechtel, Jul 04 '22 at 01:38

score 20 · Accepted Answer · edited Jan 25 '16 at 14:47

20

You could use list comprehension and enumerate with solution suggested by @AChampion:

xs = [1,2,2,2,1,1]
In [115]: [n for i, n in enumerate(xs) if i==0 or n != xs[i-1]]
Out[115]: [1, 2, 1]

That list comprehension return item if it's first or for the following if it's not equal to previous. It'll work due to lazy evaluations of if statement.

edited Jan 25 '16 at 14:47

John Kugelman

349,597
67
533
578

answered Jan 25 '16 at 05:49

Anton Protopopov

30,354
12
88
93

@AntonProtopopov. Please update when you make progress. Its unfortunate that the case that Stefan Pochmann brought up doesn't work because your solution otherwise is very elegant IMO. – David Shaked Jan 25 '16 at 06:13
@AntonProtopopov - **Please**, please, please incorporate AChampion's suggested fix into your answer. This is so much better than an inscrutable call to some inscrutable function in `itertools` whose documentation is in turn inscrutable. – David Hammen Jan 25 '16 at 07:14
5

@DavidHammen `groupby` is easy and perfectly alright, you shouldn't blame it for your own deficiency. – Stefan Pochmann Jan 25 '16 at 07:17
@AntonProtopopov - And now you get a minus one. As everyone knows,`xs[-1]` refers to the last element of an array in python. If you do not correct for this, your answer is flat out wrong. – David Hammen Jan 25 '16 at 07:24
Seems as if the bug is clear. I'll accept this answer pending any additional problems people suggest over the next few hours. – David Shaked Jan 25 '16 at 07:30

AChampion · Answer 3 · 2016-01-25T07:01:46.960

5

Using pairwise from the itertools recipes (with zip_longest) gives you an easy way of checking the next element:

import itertools as it

def pairwise(iterable):
    a, b = it.tee(iterable)
    next(b, None)
    return it.zip_longest(a, b, fillvalue=object())   # izip_longest for Py2

>>> xs = [1,2,2,3]
>>> [x for x, y in pairwise(xs) if x != y]
[1, 2, 3]
>>> xs = [1,2,2,2,2,3,3,3,4,5,6,6]
>>> [x for x, y in pairwise(xs) if x != y]
[1, 2, 3, 4, 5, 6]

edited Jan 25 '16 at 07:01

answered Jan 25 '16 at 05:58

AChampion

29,683
4
59
75

2

Slight nitpick: It removes trailing `None` values. – Stefan Pochmann Jan 25 '16 at 06:09
Noted: but not an issue with a list of ints. And can be avoided with adding a `fillvalue` to `zip_longest`, fixed! – AChampion Jan 25 '16 at 07:03

Stefan Pochmann · Answer 4 · 2016-01-25T06:02:18.103

4

You could use a less verbose loop solution:

>>> result = xs[:1]
>>> for e in xs:
        if e != result[-1]:
            result.append(e)

Or:

>>> result = []
>>> for e in xs:
        if e not in result[-1:]:
            result.append(e)

edited Jan 25 '16 at 06:02

answered Jan 25 '16 at 05:57

Stefan Pochmann

27,593
8
44
107

Iron Fist · Answer 5 · 2016-01-25T07:05:53.710

3

How about this:

>>> l = [1,1,2,3,4,4,4,4,5,6,3,3,5,5,7,8,8,8,9,1,2,3,3,3,10,10]
>>> 
>>> o = []
>>> p = None
>>> for n in l:
        if n == p:
            continue
        o.append(n)
        p = n    

>>> o
[1, 2, 3, 4, 5, 6, 3, 5, 7, 8, 9, 1, 2, 3, 10]

Apparently, above solution is more verbose than OP's, so here is an alternative to that using zip_longest from itertools module:

>>> l
[1, 1, 2, 3, 4, 4, 4, 4, 5, 6, 3, 3, 5, 5, 7, 8, 8, 8, 9, 1, 2, 3, 3, 3, 10, 10]
>>> from itertools import zip_longest
>>> o = [p for p,n in zip_longest(l,l[1:]) if p != n] #By default fillvalue=None
>>> o
[1, 2, 3, 4, 5, 6, 3, 5, 7, 8, 9, 1, 2, 3, 10]

edited Jan 25 '16 at 07:05

answered Jan 25 '16 at 06:14

Iron Fist

10,739
2
18
34

1

It works but it's exactly the same logic as the OP, except being slightly more verbose. – Alex Huszagh Jan 25 '16 at 06:29
@AlexanderHuszagh .. yep...I forgot about that point *verbosity* – Iron Fist Jan 25 '16 at 06:30
1

Updated .. using `zip_longest` – Iron Fist Jan 25 '16 at 06:46

How to remove adjacent duplicate elements in a list using list comprehensions?

5 Answers5

Related