14

How to remove duplicate items from a list using list comprehension? I have following code:

a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
b = []
b = [item for item in a if item not in b]

but it doesn't work, just produces identical list. Why its producing an identical list?

Alinwndrld
  • 777
  • 2
  • 6
  • 13
  • 5
    Because `b` is empty at the moment you execute `if item not in b`. The list comprehension is done in memory and the result is assigned to `b` at the end. – Felix Kling May 11 '12 at 10:07
  • That means list comprehension doesn't work like loop? – Alinwndrld May 11 '12 at 10:17
  • If you don't want to use a set because you want to preserve the order, look at the `unique_everseen` iterator in the [itertools recipes](http://docs.python.org/library/itertools.html#recipes). Use like this: `b = list(unique_everseen(a))` – Lauritz V. Thaulow May 11 '12 at 10:29
  • 2
    It's kind of a loop, but it generates the result in one go... it's not that surprising either. Whenever you have the expression `x = y`, then `y` is evaluated first and then the result is assigned to `x`. But during evaluation of `y`, `x` is not modified. Would you have had the same doubts if you had `b = list(item for item in a if item not in b)` instead? – Felix Kling May 11 '12 at 10:29

8 Answers8

15

It's producing an identical list as b contains no elements at run-time. What you'd want it this:

>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> b = []
>>> [b.append(item) for item in a if item not in b]
[None, None, None, None, None, None, None, None]
>>> b
[1, 2, 3, 5, 9, 6, 8, 7]
Christian Witts
  • 11,375
  • 1
  • 33
  • 46
  • 23
    Beware of using [list comprehensions for side effects](http://stackoverflow.com/q/5753597/566644). Use a regular for loop instead. – Lauritz V. Thaulow May 11 '12 at 10:32
  • This is also a `O(n²)` answer, where, for hashable inputs, `O(n)` is possible ([with](https://stackoverflow.com/a/65227764/364696) or [without](https://stackoverflow.com/a/10549362/364696) preserving order), and for nonhashable, but sortable inputs, `O(n log n)` is possible (though it replaces original ordering with sorted ordering unless you go to some effort to decorate and undecorate the inputs with their index and incorporate it in the sorting and deduplication so a second sort can restore the original ordering). – ShadowRanger Dec 10 '20 at 02:49
12

If you don't mind using a different technique than list comprehension you can use a set for that:

>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> b = list(set(a))
>>> print b
[1, 2, 3, 5, 6, 7, 8, 9]
Niek de Klein
  • 8,524
  • 20
  • 72
  • 143
  • I have looked at set function, just want to know what's wrong with above code and if it could be corrected? – Alinwndrld May 11 '12 at 10:09
  • 5
    set will not keep the initial order... so just take notice of that – Adi Roiban Feb 12 '15 at 12:14
  • @AdiRoiban: That [can be fixed with minimal code changes](https://stackoverflow.com/a/65227764/364696). It's slower than using `set`, but not *much* slower if you're on 3.6+ (if you're on 3.5 and earlier using `OrderedDict`, it's a bigger hit; >3x runtime, while 3.6+ with plain `dict` is only about a 66% increase in runtime). – ShadowRanger Dec 10 '20 at 02:44
5

Use keys on a dict constructed with values in a as its keys.

b = dict([(i, 1) for i in a]).keys()

Or use a set:

b = [i for i in set(a)]
Vikas
  • 8,790
  • 4
  • 38
  • 48
4

The reason that the list is unchanged is that b starts out empty. This means that if item not in b is always True. Only after the list has been generated is this new non-empty list assigned to the variable b.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • 1
    If I understand it correctly, that means list comprehension adds items at one go instead of checking and adding each item one at a time like in loop. – Alinwndrld May 11 '12 at 10:14
  • 2
    @Alinwndrld: I don't think that's a valid conclusion. It only means that the list comprehension is evaluated before the assignment. The list may well be built up in a loop internally. – CB Bailey May 11 '12 at 10:46
4

Use groupby:

>>> from itertools import groupby
>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> [k for k, _ in groupby(sorted(a, key=lambda x: a.index(x)))]
[1, 2, 3, 5, 9, 6, 8, 7]

Leave out the key argument if you don't care about which order the value first appeared in the original list, e.g.

>>> [k for k, _ in groupby(sorted(a))]
[1, 2, 3, 5, 6, 7, 8, 9]

You can do some cool things with groupby. To identify items that appear multiple times:

>>> [k for k, v in groupby(sorted(a)) if len(list(v)) > 1]
[2, 3, 5, 8]

Or to build up a frequency dictionary:

>>> {k: len(list(v)) for k, v in groupby(sorted(a))}
{1: 1, 2: 3, 3: 4, 5: 4, 6: 1, 7: 1, 8: 2, 9: 1}

There are some very useful functions in the itertools module: chain, tee and product to name a few!

Josh Bode
  • 3,582
  • 1
  • 27
  • 17
1
>>> a = [10,20,30,20,10,50,60,40,80,50,40,0,100,30,60]
>>> [a.pop(a.index(i, a.index(i)+1)) for i in a if a.count(i) > 1]
>>> print(a)
rajkrish06
  • 39
  • 5
1
>>> from itertools import groupby
>>> repeated_items = [2,2,2,2,3,3,3,3,4,5,1,1,1]
>>> [
...     next(group)
...     for _, group in groupby(
...         repeated_items,
...         key=repeated_items.index
...     )
... ]
[2, 3, 4, 5, 1]
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Clever solution, I like it. Downside is the `index` call, which makes it `O(n²)`, and the assumption that the input is already grouped (it won't work on `[2,1,2]`). You could solve both problems, *and* still preserve input order, with a modified Schwartzian Transform (requires `from itertools import count, groupby`): `[v for v, _ in sorted([next(grp) for _, grp in groupby(sorted(zip(repeated_items, count())), key=lambda x: x[0])], key=lambda x: x[1])]`. Might not be worth the trouble, but I love a good bit of `itertools` powered insanity. – ShadowRanger Dec 10 '20 at 02:59
1

For Python 3.6+, there is an improvement to be had over Niek de Klein's mostly excellent solution (it's primary flaw being that it loses input ordering). Since dicts are now insertion-ordered, you can just do:

b = list(dict.fromkeys(a))

On earlier Python, you'd do:

from collections import OrderedDict

b = list(OrderedDict.fromkeys(a))

It's not nearly as fast though (even when OrderedDict was moved to the C layer, it kept a lot of overhead to support reordering operations that dict, which doesn't support them, avoids).

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271