How to remove duplicate items from a list using list comprehension?

Question

How to remove duplicate items from a list using list comprehension? I have following code:

a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
b = []
b = [item for item in a if item not in b]

but it doesn't work, just produces identical list. Why its producing an identical list?

Because `b` is empty at the moment you execute `if item not in b`. The list comprehension is done in memory and the result is assigned to `b` at the end. — Felix Kling, May 11 '12 at 10:07
If you don't want to use a set because you want to preserve the order, look at the `unique_everseen` iterator in the [itertools recipes](http://docs.python.org/library/itertools.html#recipes). Use like this: `b = list(unique_everseen(a))` — Lauritz V. Thaulow, May 11 '12 at 10:29
It's kind of a loop, but it generates the result in one go... it's not that surprising either. Whenever you have the expression `x = y`, then `y` is evaluated first and then the result is assigned to `x`. But during evaluation of `y`, `x` is not modified. Would you have had the same doubts if you had `b = list(item for item in a if item not in b)` instead? — Felix Kling, May 11 '12 at 10:29

score 15 · Accepted Answer · answered May 11 '12 at 10:09

15

It's producing an identical list as b contains no elements at run-time. What you'd want it this:

>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> b = []
>>> [b.append(item) for item in a if item not in b]
[None, None, None, None, None, None, None, None]
>>> b
[1, 2, 3, 5, 9, 6, 8, 7]

answered May 11 '12 at 10:09

Christian Witts

11,375
1
33
46

23

Beware of using [list comprehensions for side effects](http://stackoverflow.com/q/5753597/566644). Use a regular for loop instead. – Lauritz V. Thaulow May 11 '12 at 10:32
This is also a `O(n²)` answer, where, for hashable inputs, `O(n)` is possible ([with](https://stackoverflow.com/a/65227764/364696) or [without](https://stackoverflow.com/a/10549362/364696) preserving order), and for nonhashable, but sortable inputs, `O(n log n)` is possible (though it replaces original ordering with sorted ordering unless you go to some effort to decorate and undecorate the inputs with their index and incorporate it in the sorting and deduplication so a second sort can restore the original ordering). – ShadowRanger Dec 10 '20 at 02:49

score 12 · Answer 2 · answered May 11 '12 at 10:06

12

If you don't mind using a different technique than list comprehension you can use a set for that:

>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> b = list(set(a))
>>> print b
[1, 2, 3, 5, 6, 7, 8, 9]

answered May 11 '12 at 10:06

Niek de Klein

8,524
20
72
143

I have looked at set function, just want to know what's wrong with above code and if it could be corrected? – Alinwndrld May 11 '12 at 10:09
5

set will not keep the initial order... so just take notice of that – Adi Roiban Feb 12 '15 at 12:14
@AdiRoiban: That [can be fixed with minimal code changes](https://stackoverflow.com/a/65227764/364696). It's slower than using `set`, but not *much* slower if you're on 3.6+ (if you're on 3.5 and earlier using `OrderedDict`, it's a bigger hit; >3x runtime, while 3.6+ with plain `dict` is only about a 66% increase in runtime). – ShadowRanger Dec 10 '20 at 02:44

score 5 · Answer 3 · answered May 11 '12 at 10:07

5

Use keys on a dict constructed with values in a as its keys.

b = dict([(i, 1) for i in a]).keys()

Or use a set:

b = [i for i in set(a)]

answered May 11 '12 at 10:07

Vikas

8,790
4
38
48

score 4 · Answer 4 · answered May 11 '12 at 10:09

4

The reason that the list is unchanged is that b starts out empty. This means that if item not in b is always True. Only after the list has been generated is this new non-empty list assigned to the variable b.

answered May 11 '12 at 10:09

CB Bailey

755,051
104
632
656

1

If I understand it correctly, that means list comprehension adds items at one go instead of checking and adding each item one at a time like in loop. – Alinwndrld May 11 '12 at 10:14
2

@Alinwndrld: I don't think that's a valid conclusion. It only means that the list comprehension is evaluated before the assignment. The list may well be built up in a loop internally. – CB Bailey May 11 '12 at 10:46

Josh Bode · Answer 5 · 2012-05-11T21:15:25.963

Use groupby:

>>> from itertools import groupby
>>> a = [1, 2, 3, 3, 5, 9, 6, 2, 8, 5, 2, 3, 5, 7, 3, 5, 8]
>>> [k for k, _ in groupby(sorted(a, key=lambda x: a.index(x)))]
[1, 2, 3, 5, 9, 6, 8, 7]

Leave out the key argument if you don't care about which order the value first appeared in the original list, e.g.

>>> [k for k, _ in groupby(sorted(a))]
[1, 2, 3, 5, 6, 7, 8, 9]

You can do some cool things with groupby. To identify items that appear multiple times:

>>> [k for k, v in groupby(sorted(a)) if len(list(v)) > 1]
[2, 3, 5, 8]

Or to build up a frequency dictionary:

>>> {k: len(list(v)) for k, v in groupby(sorted(a))}
{1: 1, 2: 3, 3: 4, 5: 4, 6: 1, 7: 1, 8: 2, 9: 1}

There are some very useful functions in the itertools module: chain, tee and product to name a few!

score 1 · Answer 6 · answered Jun 03 '16 at 07:01

1

>>> a = [10,20,30,20,10,50,60,40,80,50,40,0,100,30,60]
>>> [a.pop(a.index(i, a.index(i)+1)) for i in a if a.count(i) > 1]
>>> print(a)

answered Jun 03 '16 at 07:01

rajkrish06

39
5

score 1 · Answer 7 · edited Dec 10 '20 at 02:53

1

>>> from itertools import groupby
>>> repeated_items = [2,2,2,2,3,3,3,3,4,5,1,1,1]
>>> [
...     next(group)
...     for _, group in groupby(
...         repeated_items,
...         key=repeated_items.index
...     )
... ]
[2, 3, 4, 5, 1]

edited Dec 10 '20 at 02:53

ShadowRanger

143,180
12
188
271

answered Dec 11 '19 at 10:45

user8902140

53
5

Clever solution, I like it. Downside is the `index` call, which makes it `O(n²)`, and the assumption that the input is already grouped (it won't work on `[2,1,2]`). You could solve both problems, *and* still preserve input order, with a modified Schwartzian Transform (requires `from itertools import count, groupby`): `[v for v, _ in sorted([next(grp) for _, grp in groupby(sorted(zip(repeated_items, count())), key=lambda x: x[0])], key=lambda x: x[1])]`. Might not be worth the trouble, but I love a good bit of `itertools` powered insanity. – ShadowRanger Dec 10 '20 at 02:59

score 1 · Answer 8 · answered Dec 10 '20 at 02:40

For Python 3.6+, there is an improvement to be had over Niek de Klein's mostly excellent solution (it's primary flaw being that it loses input ordering). Since dicts are now insertion-ordered, you can just do:

b = list(dict.fromkeys(a))

On earlier Python, you'd do:

from collections import OrderedDict

b = list(OrderedDict.fromkeys(a))

It's not nearly as fast though (even when OrderedDict was moved to the C layer, it kept a lot of overhead to support reordering operations that dict, which doesn't support them, avoids).

How to remove duplicate items from a list using list comprehension?

8 Answers8

Linked

Related