Fastest way to remove duplicates in a list without importing libraries and using sets

Question

I was trying to remove duplicates from a list using the following code:

a = [1,2,3,4,2,6,1,1,5,2]
res = []
[res.append(i) for i in a if i not in res]

But I would like to do this without defining the list I want as an empty list (i.e., omit the line res = []) like:

a = [1,2,3,4,2,6,1,1,5,2]

# Either:
res = [i for i in a if i not in res]

# Or:
[i for i in a if i not in 'this list'] # This list is not a string. I meant it as the list being comprehended.

I want to avoid library imports and set().

I believe you cannot do that, use `set(a)` to remove duplicates, one-line and simple also. If order matters use a dictionary or an OrderedDict, depending on you Python's version, but this will be hacky. — Dani Mesejo, Apr 18 '20 at 13:42
Not everything with lists is a natural candidate for a comprehension. Also, why use a quadratic algorithm? — John Coleman, Apr 18 '20 at 13:42
This problem sounds artificial. There are many (and more efficient) ways of achieving what you want. — rdas, Apr 18 '20 at 13:45
@rdas i would love to know those, if libraries aren't imported for its process — Joshua Varghese, Apr 18 '20 at 13:45
Don't create useless meta tags, see e.g. https://stackoverflow.com/help/tagging — jonrsharpe, Apr 20 '20 at 06:48
Does this answer your question? [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) — Georgy, May 05 '20 at 13:55
@Georgy that is not really the answer required! we need here the speed comparison — Joshua Varghese, May 05 '20 at 13:56
@JoshuaVarghese Fair enough, I will retract the flag and edit the title to make it more specific. — Georgy, May 05 '20 at 14:02

score 6 · Answer 1 · edited May 01 '22 at 12:06

6

I think this may work for you. It removes duplicates from the list while keeping the order.

newlist = [i for n,i in enumerate(L) if i not in L[:n]]

edited May 01 '22 at 12:06

Peter Mortensen

30,738
21
105
131

answered Apr 18 '20 at 13:45

Amit Davidson

326
2
3

Very nice, using `enumerate` as the generator and check the list slice seen so far. – ThomasH Apr 18 '20 at 17:10

score 5 · Accepted Answer · edited May 01 '22 at 10:43

For Python 3.6+, you can use dict.fromkeys():

>>> a = [1, 2, 3, 4, 2, 6, 1, 1, 5, 2]
>>> list(dict.fromkeys(a))
[1, 2, 3, 4, 6, 5]

From the documentation:

Create a new dictionary with keys from iterable and values set to value.

If you are using a lower Python version, you will need to use collections.OrderedDict to maintain order:

>>> from collections import OrderedDict
>>> a = [1, 2, 3, 4, 2, 6, 1, 1, 5, 2]
>>> list(OrderedDict.fromkeys(a))
[1, 2, 3, 4, 6, 5]

score 4 · Answer 3 · edited May 01 '22 at 10:37

4

Here is a simple benchmark with the proposed solutions,

It shows that dict.fromkeys will perform the best.

from simple_benchmark import BenchmarkBuilder
import random


b = BenchmarkBuilder()

@b.add_function()
def AmitDavidson(a):
    return [i for n,i in enumerate(a) if i not in a[:n]]

@b.add_function()
def RoadRunner(a):
    return list(dict.fromkeys(a))

@b.add_function()
def DaniMesejo(a):
    return  list({k: '' for k in a})


@b.add_function()
def rdas(a):
    return  sorted(list(set(a)), key=lambda x: a.index(x))


@b.add_function()
def unwanted_set(a):
    return  list(set(a))


@b.add_arguments('List lenght')
def argument_provider():
    for exp in range(2, 18):
        size = 2**exp
        yield size, [random.randint(0, 10) for _ in range(size)]

r = b.run()
r.plot()

edited May 01 '22 at 10:37

Peter Mortensen

30,738
21
105
131

answered Apr 18 '20 at 15:06

kederrac

16,819
6
32
55

2

Ah nice. I was going to post something similar, but this is better. +1 – RoadRunner Apr 18 '20 at 15:08
this is awesomeness :) +1 – Joshua Varghese Apr 18 '20 at 15:53
1

stackoverflow should auto-create these graphs – Joshua Varghese Apr 18 '20 at 15:54

score 3 · Answer 4 · answered Apr 18 '20 at 13:48

3

Here is a solution using set that does preserve the order:

a = [1,2,3,4,2,6,1,1,5,2]
a_uniq = sorted(list(set(a)), key=lambda x: a.index(x))
print(a_uniq)

answered Apr 18 '20 at 13:48

rdas

20,604
6
33
46

The main motivation for using `set` (beyond its concision) is to depulicate in sub-quadratic time but this use of `index` bumps it back up to quadratic. – John Coleman Apr 18 '20 at 13:50
1

So does the OPs comprehensions – rdas Apr 18 '20 at 13:51

Dani Mesejo · Answer 5 · 2020-04-18T13:51:16.877

2

One-liner, comprehension, O(n), that preserves order in Python 3.6+:

a = [1, 2, 3, 4, 2, 6, 1, 1, 5, 2]

res = list({k: '' for k in a})
print(res)

edited Apr 18 '20 at 13:51

answered Apr 18 '20 at 13:45

Dani Mesejo

61,499
6
49
76

Fastest way to remove duplicates in a list without importing libraries and using sets

5 Answers5

Linked