20

Is there a way to use list comprehensions in python to filter adjacent duplicates from a list?

Here's an example of what I mean:

>>> xs = [1,2,2,3]
>>> print added.reAdj(xs)
[1,2,3]

A search through SE revealed an earlier inquiry asking a similar but slightly different question: whether all duplicates could be removed from a list, but not explicitly asking for solutions involving list comprehensions. The motivation for using list comprehensions specifically follows a recognition of their advantages over traditional for loops. Users suggested the use of the set() function or standard looping as such:

result = []
most_recent_elem = None
for e in xs:
    if e != most_recent_elem:
        result.append(e)
        most_recent_elem = e

The set() suggestion fails to meet the task in that non-adjacent duplicates are removed, while the loop is effective but verbose.

It seems a means for safely referencing the next element in a list comprehension as follows is needed.

[x for x in xs if x != **x.next()**]

Any ideas?

Community
  • 1
  • 1
David Shaked
  • 3,171
  • 3
  • 20
  • 31

5 Answers5

30

You can use itertools.groupby:

>>> import itertools
>>> [key for key, grp in itertools.groupby([1, 2, 2, 3])]
[1, 2, 3]

itertools.groupby returns an iterator. By iterating it, you will get a key, group pairs. (key will be a item if no key function is specified, otherwise the return value of the key function). group is an iterator which will yields items grouped by applying key function (if not specified, same values will be grouped)

>>> import itertools
>>> it = itertools.groupby([1, 2, 2, 3])
>>> it
<itertools.groupby object at 0x7feec0863048>
>>> for key, grp in it:
...     print(key)
...     print(grp)
... 
1
<itertools._grouper object at 0x7feec0828ac8>
2
<itertools._grouper object at 0x7feec0828b00>
3
<itertools._grouper object at 0x7feec0828ac8>
>>> it = itertools.groupby([1, 2, 2, 3])
>>> for key, grp in it:
...     print(list(grp))
... 
[1]
[2, 2]
[3]

Above solution, I used only key because the question does not care how many items are adjacent.

falsetru
  • 357,413
  • 63
  • 732
  • 636
20

You could use list comprehension and enumerate with solution suggested by @AChampion:

xs = [1,2,2,2,1,1]
In [115]: [n for i, n in enumerate(xs) if i==0 or n != xs[i-1]]
Out[115]: [1, 2, 1]

That list comprehension return item if it's first or for the following if it's not equal to previous. It'll work due to lazy evaluations of if statement.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
  • @AntonProtopopov. Please update when you make progress. Its unfortunate that the case that Stefan Pochmann brought up doesn't work because your solution otherwise is very elegant IMO. – David Shaked Jan 25 '16 at 06:13
  • @AntonProtopopov - **Please**, please, please incorporate AChampion's suggested fix into your answer. This is so much better than an inscrutable call to some inscrutable function in `itertools` whose documentation is in turn inscrutable. – David Hammen Jan 25 '16 at 07:14
  • 5
    @DavidHammen `groupby` is easy and perfectly alright, you shouldn't blame it for your own deficiency. – Stefan Pochmann Jan 25 '16 at 07:17
  • @AntonProtopopov - And now you get a minus one. As everyone knows,`xs[-1]` refers to the last element of an array in python. If you do not correct for this, your answer is flat out wrong. – David Hammen Jan 25 '16 at 07:24
  • Seems as if the bug is clear. I'll accept this answer pending any additional problems people suggest over the next few hours. – David Shaked Jan 25 '16 at 07:30
5

Using pairwise from the itertools recipes (with zip_longest) gives you an easy way of checking the next element:

import itertools as it

def pairwise(iterable):
    a, b = it.tee(iterable)
    next(b, None)
    return it.zip_longest(a, b, fillvalue=object())   # izip_longest for Py2

>>> xs = [1,2,2,3]
>>> [x for x, y in pairwise(xs) if x != y]
[1, 2, 3]
>>> xs = [1,2,2,2,2,3,3,3,4,5,6,6]
>>> [x for x, y in pairwise(xs) if x != y]
[1, 2, 3, 4, 5, 6]
AChampion
  • 29,683
  • 4
  • 59
  • 75
4

You could use a less verbose loop solution:

>>> result = xs[:1]
>>> for e in xs:
        if e != result[-1]:
            result.append(e)

Or:

>>> result = []
>>> for e in xs:
        if e not in result[-1:]:
            result.append(e)
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
3

How about this:

>>> l = [1,1,2,3,4,4,4,4,5,6,3,3,5,5,7,8,8,8,9,1,2,3,3,3,10,10]
>>> 
>>> o = []
>>> p = None
>>> for n in l:
        if n == p:
            continue
        o.append(n)
        p = n    

>>> o
[1, 2, 3, 4, 5, 6, 3, 5, 7, 8, 9, 1, 2, 3, 10]

Apparently, above solution is more verbose than OP's, so here is an alternative to that using zip_longest from itertools module:

>>> l
[1, 1, 2, 3, 4, 4, 4, 4, 5, 6, 3, 3, 5, 5, 7, 8, 8, 8, 9, 1, 2, 3, 3, 3, 10, 10]
>>> from itertools import zip_longest
>>> o = [p for p,n in zip_longest(l,l[1:]) if p != n] #By default fillvalue=None
>>> o
[1, 2, 3, 4, 5, 6, 3, 5, 7, 8, 9, 1, 2, 3, 10]
Iron Fist
  • 10,739
  • 2
  • 18
  • 34