In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

Question

For example:

>>> x = [1, 1, 2, 'a', 'a', 3]
>>> unique(x)
[1, 2, 'a', 3]

Assume list elements are hashable.

Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].

Are we keeping the first of duplicates, or last, or somewhere in the middle? e.g., [1,2,3,2,3,1], does that become [1,2,3], or [2,3,1], or something else? — C. K. Young, Sep 18 '08 at 01:29
How Do I apply the Homework tag to something? When it says assume elements are hashable, your prof is asking you to put the entries in a hashtable, then it's easy to see if youve come across them before as you walk down the list. — Karl, Nov 12 '08 at 00:06
Benchmark and a clear answer [here](http://www.peterbe.com/plog/uniqifiers-benchmark). — Bite code, Sep 18 '08 at 11:54

score 33 · Answer 1 · edited Dec 27 '21 at 19:44

33

def unique(items):
    found = set()
    keep = []

    for item in items:
        if item not in found:
            found.add(item)
            keep.append(item)
            
    return keep

print unique([1, 1, 2, 'a', 'a', 3])

edited Dec 27 '21 at 19:44

ShadowRanger

143,180
12
188
271

answered Sep 18 '08 at 01:41

Terhorst

2,071
3
16
19

7

set() is better than set([]). – Constantin Sep 29 '08 at 15:22
1

In-place algorithms are faster. See james' and mine answers. – jfs Nov 12 '08 at 00:44
5

This is an old thread, but if you make the add() and append() methods local function (put `add = found.add` and `app = keep.append` before the loop and then use `add(item)` and `app(item)`, then this is the fastest by far. The reason that the dictionary usage was faster was that it didn't require an attribute look-up for each add and append. Just my two cents. – Justin Peel Nov 22 '10 at 23:22
If you put it into a list comprehension afterwards, you get another speed improvement. Taking all the changes together, speed nearly doubles. See my comparison further down this page. – Michael Dec 27 '13 at 07:41
@JustinPeel: have you tried to run [the benchmark](http://stackoverflow.com/questions/89178/in-python-what-is-the-fastest-algorithm-for-removing-duplicates-from-a-list-so/282589?noredirect=1#comment10841935_282589)? – jfs Jan 13 '15 at 07:18
Why is the `set()` needed at all? Why isn't it enough to test membership of `item in items` against `keep[]`? – so.very.tired Feb 23 '18 at 17:50
1

@so.very.tired Because `keep` is a list, and checking for membership in a list takes on average linear time in the length of the list. Meanwhile, checking for membership in a set takes on average constant time (see [this](https://wiki.python.org/moin/TimeComplexity)). Using appropriate data structures is a deal breaker when it comes to performance. In any case, this answer is outdated. Check out [this question](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) instead. – user_ Jan 15 '20 at 01:15
Note: This is a simpler version of [the `itertools` module's `unique_everseen` recipe](https://docs.python.org/3/library/itertools.html#itertools-recipes) (the recipe improves upon it a bit, performance-wise, by pushing more work to the C layer and making the function a generator, so unique items are produced immediately when seen, rather than having to wait for the whole input to be processed before you can process any of them, and reducing the storage to just a `set`, not a `set` *and* `list` containing the same elements). – ShadowRanger Dec 27 '21 at 19:56

score 21 · Answer 2 · edited Dec 24 '16 at 12:08

21

Using:

lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]

And using the timeit module:

$ python -m timeit -s 'import uniquetest' 'uniquetest.etchasketch(uniquetest.lst)'

And so on for the various other functions (which I named after their posters), I have the following results (on my first generation Intel MacBook Pro):

Allen:                  14.6 µs per loop [1]
Terhorst:               26.6 µs per loop
Tarle:                  44.7 µs per loop
ctcherry:               44.8 µs per loop
Etchasketch 1 (short):  64.6 µs per loop
Schinckel:              65.0 µs per loop
Etchasketch 2:          71.6 µs per loop
Little:                 89.4 µs per loop
Tyler:                 179.0 µs per loop

[1] Note that Allen modifies the list in place – I believe this has skewed the time, in that the timeit module runs the code 100000 times and 99999 of them are with the dupe-less list.

Summary: Straight-forward implementation with sets wins over confusing one-liners :-)

edited Dec 24 '16 at 12:08

mkierc

1,193
2
15
28

answered Sep 18 '08 at 05:14

John Fouhy

41,203
19
62
77

james suggested a faster version. See http://stackoverflow.com/questions/89178/#91430 – jfs Nov 12 '08 at 00:16
1

Bump for using OSX. ;-) – geowar Feb 12 '13 at 00:48
@jfs: Both james and Allen's versions mutate in-place, so unless the microbenchmark used accounts for that (e.g. by calling the functions using a fresh `list` each time and/or including no duplicates at all), the timings aren't comparable. The fastest solutions nowadays (from 3.6 onwards) are either the `unique_everseen` recipe from `itertools` (if you need to process each element as it is found) which is about 10% faster than Terhorst's solution, or, at 40-80% of the runtime of `unique_everseen`, `list(dict.fromkeys(iterable))` (or `lst[:] = dict.fromkeys(lst)` to operate "in place"). – ShadowRanger Dec 27 '21 at 20:05
@ShadowRanger: my microbenchmarks include `f(lst[:])` (i.e., the copying happens on each call). Though Python has changed in 12+ years. I would not rely on the microbenchmarks results from so many years ago. – jfs Dec 29 '21 at 08:49

jfs · Answer 3 · 2020-12-11T18:12:16.253

18

Update: on Python3.7+:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

^{old answer:}

Here is the fastest solution so far (for the following input):

def del_dups(seq):
    seen = {}
    pos = 0
    for item in seq:
        if item not in seen:
            seen[item] = True
            seq[pos] = item
            pos += 1
    del seq[pos:]

lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 
       13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 
       5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 
       9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
del_dups(lst)
print(lst)
# -> [8, 9, 7, 15, 2, 20, 13, 24, 6, 11, 12, 4, 10, 18, 23, 3, 5, 22, 19, 14, 
#     21, 1, 0, 16, 17]

Dictionary lookup is slightly faster then the set's one in Python 3.

edited Dec 11 '20 at 18:12

answered Nov 12 '08 at 00:04

jfs

399,953
195
994
1,670

Could you explain why the Dictionary lookup is faster than a test for set membership in this case? – Stephen Emslie Jan 03 '12 at 10:01
@Stephen Emslie: I don't know. It might be a benchmark artifact. [Try it yourself](http://ideone.com/FEyrs). A pure speculation: dictionary is a fundamental data structure for CPython (namespaces, classes are/were implemented via dictionaries) therefore dicts are more tuned/optimized than sets. – jfs Jan 03 '12 at 11:18
4

Good timing, but wrong conclusion. The timing shows only that operator access such as ``d[k] = v`` is faster than method call access such as ``d.__setitem__(k, v)`` even if the latter has been pre-bound using ``d_setitem = d.__setitem__`` and then timing ``d_setitem(k, v)``. – Raymond Hettinger Dec 27 '13 at 08:12
Using Python 3.4, I tried your test script; and the function that uses a set is consistently **slightly** faster. – Honest Abe Jan 12 '15 at 21:29
@jfs: In 3.6+, this can be made much simpler and [even faster](https://ideone.com/98fKtX) by simplifying to just `def del_dups(seq): seq[:] = dict.fromkeys(seq)` (to modify in-place) or `def del_dups(seq): return list(dict.fromkeys(seq))` to make a new copy without duplicates. – ShadowRanger Dec 10 '20 at 03:12
@ShadowRanger dict order is guaranteed by the language only since Python 3.7. Though it is ok for CPython 3.6+, Pypy implementations. dict.fromkeys is O(n) in memory algorithm unlike O(k) memory solution in the answer – jfs Dec 10 '20 at 16:24
Please ignore the last remark about the memory (my mistake) – jfs Dec 10 '20 at 16:31
@jfs: True (on dict order guarantee). I usually include that qualification; forgot this time. There's only one 3.6 supporting interpreter aside from those two that I know of (Stackless Python), no idea what guarantees it has. That said, there *were* some insertion-ordering guarantees made in 3.6 due to PEP 468 and PEP 520 that are much simpler to satisfy if `dict`s are insertion-ordered when no deletions are performed (the only reason they didn't guarantee it in 3.6 was because they weren't sure if they wanted to limit themselves by guaranteeing deletion gaps weren't filled with new elements). – ShadowRanger Dec 10 '20 at 16:35
Since this use of `dict.fromkeys` definitionally doesn't delete from the resulting `dict` before inserting new keys (it makes the `dict`, shallow copies the keys, throws it away), it's *likely* to work in 3.6 interpreters that don't guarantee it, but yes, not guaranteed until 3.7. – ShadowRanger Dec 10 '20 at 16:36

score 15 · Answer 4 · answered Sep 18 '08 at 01:33

15

What's going to be fastest depends on what percentage of your list is duplicates. If it's nearly all duplicates, with few unique items, creating a new list will probably be faster. If it's mostly unique items, removing them from the original list (or a copy) will be faster.

Here's one for modifying the list in place:

def unique(items):
  seen = set()
  for i in xrange(len(items)-1, -1, -1):
    it = items[i]
    if it in seen:
      del items[i]
    else:
      seen.add(it)

Iterating backwards over the indices ensures that removing items doesn't affect the iteration.

answered Sep 18 '08 at 01:33

Allen

5,034
22
30

This gives different results to the other solutions (the OP didn't specify which is correct), as regards which duplicate to keep. This solution: [1, 2, 1] -> [2, 1] Other solutions: [1, 2, 1] -> [1, 2] – James Hopkin Sep 18 '08 at 09:24
I added a clarification about this in the question text. – J Miller Sep 18 '08 at 11:50

score 10 · Answer 5 · answered Sep 18 '08 at 10:17

10

This is the fastest in-place method I've found (assuming a large proportion of duplicates):

def unique(l):
    s = set(); n = 0
    for x in l:
        if x not in s: s.add(x); l[n] = x; n += 1
    del l[n:]

This is 10% faster than Allen's implementation, on which it is based (timed with timeit.repeat, JIT compiled by psyco). It keeps the first instance of any duplicate.

repton-infinity: I'd be interested if you could confirm my timings.

answered Sep 18 '08 at 10:17

James Hopkin

13,797
1
42
71

Dictionaries are slightly faster than sets. See my answer http://stackoverflow.com/questions/89178/#282589 – jfs Nov 12 '08 at 00:14

score 7 · Answer 6 · answered Sep 27 '08 at 15:54

7

Obligatory generator-based variation:

def unique(seq):
  seen = set()
  for x in seq:
    if x not in seen:
      seen.add(x)
      yield x

answered Sep 27 '08 at 15:54

Constantin

27,478
10
60
79

Raymond Hettinger · Answer 7 · 2016-10-03T15:49:13.647

7

This may be the simplest way:

list(OrderedDict.fromkeys(iterable))

As of Python 3.5, OrderedDict is now implemented in C, so this was is now the shortest, cleanest, and fastest.

edited Oct 03 '16 at 15:49

answered Oct 21 '11 at 01:10

Raymond Hettinger

216,523
63
388
485

Elegant, but unfortunately about a magnitude slower than the fastest solution, and surprisingly one of the slowest solutions in general. The `OrderedDict` seems to be a real performance killer. – Michael Dec 26 '13 at 16:04
Perhaps if `OrderedSet` ever becomes a builtin we'd have a very fast solution – jamylak Jan 01 '15 at 07:02

score 5 · Answer 8 · answered Sep 18 '08 at 01:30

5

Taken from http://www.peterbe.com/plog/uniqifiers-benchmark

def f5(seq, idfun=None):  
    # order preserving 
    if idfun is None: 
        def idfun(x): return x 
    seen = {} 
    result = [] 
    for item in seq: 
        marker = idfun(item) 
        # in old Python versions: 
        # if seen.has_key(marker) 
        # but in new ones: 
        if marker in seen: continue 
        seen[marker] = 1 
        result.append(item) 
    return result

answered Sep 18 '08 at 01:30

Chris Cherry

28,118
6
68
71

1

it is slower than corresponding in-place version (at least for some inputs). See http://stackoverflow.com/questions/89178/#282589 – jfs Nov 12 '08 at 00:26

score 5 · Answer 9 · answered Sep 18 '08 at 01:59

5

One-liner:

new_list = reduce(lambda x,y: x+[y][:1-int(y in x)], my_list, [])

answered Sep 18 '08 at 01:59

Tyler

28,498
11
90
106

Michael · Answer 10 · 2013-12-27T07:54:31.793

4

This is the fastest one, comparing all the stuff from this lengthy discussion and the other answers given here, refering to this benchmark. It's another 25% faster than the fastest function from the discussion, f8. Thanks to David Kirby for the idea.

def uniquify(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if x not in seen and not seen_add(x)]

Some time comparison:

$ python uniqifiers_benchmark.py 
* f8_original 3.76
* uniquify 3.0
* terhorst 5.44
* terhorst_localref 4.08
* del_dups 4.76

edited Dec 27 '13 at 07:54

answered Dec 26 '13 at 15:38

Michael

7,316
1
37
63

I don't see time comparison with solutions from the top answers here. In my experience, [explicit loops](http://stackoverflow.com/a/89250/4279) are faster than list comprehension in CPython (at least it requires a benchmark to test for each particular case). – jfs Dec 26 '13 at 17:56
I added the timings above. The main overhead in the presented solutions is the property lookup for add, append and so on, but even if you kick that out, the list comprehension is about 25% faster than `terhorst_localreferences`. – Michael Dec 27 '13 at 07:37
could you include the complete benchmark code in your answer? I don't see `terhorst` (or any other relevant code) in [the file you linked](http://www.peterbe.com/plog/uniqifiers-benchmark/uniqifiers_benchmark.py). – jfs Dec 27 '13 at 08:31
http://pastebin.com/C5SQmT1R – Michael Dec 27 '13 at 16:52

score 4 · Answer 11 · answered Apr 09 '10 at 13:11

4

An in-place one-liner for this:

>>> x = [1, 1, 2, 'a', 'a', 3]
>>> [ item for pos,item in enumerate(x) if x.index(item)==pos ]
[1, 2, 'a', 3]

answered Apr 09 '10 at 13:11

Mario Ruggier

906
8
6

HI Mario, How this works, please explain, what I understood is index return only one value, so its unique ? – James Sapam Dec 08 '13 at 05:40
The list.index(item) method returns the position of the *first* item found in the list, and comparing this with the actual position of the item (from enumerate) we can therefore tell whether that item is the first occurring or not, keeping only the first occurring. – Mario Ruggier Jan 29 '14 at 11:08
1

thank you, this is a very elegant solution. – James Sapam Jan 29 '14 at 12:58
Very nice solution. But is it really _in-place_? When you use list comprehension, isn't a new list being created? – lifebalance Oct 11 '20 at 04:09

Jake · Answer 12 · 2008-09-18T02:33:38.153

3

You can actually do something really cool in Python to solve this. You can create a list comprehension that would reference itself as it is being built. As follows:

   # remove duplicates...
   def unique(my_list):
       return [x for x in my_list if x not in locals()['_[1]'].__self__]

Edit: I removed the "self", and it works on Mac OS X, Python 2.5.1.

The _[1] is Python's "secret" reference to the new list. The above, of course, is a little messy, but you could adapt it fit your needs as necessary. For example, you can actually write a function that returns a reference to the comprehension; it would look more like:

return [x for x in my_list if x not in this_list()]

edited Sep 18 '08 at 02:33

answered Sep 18 '08 at 01:43

Jake

15,007
22
70
86

The example as given does not compile for me -- the trailing ".__self__" is not valid [[Linux 2.6 w/ Python 2.5.1]] – Kevin Little Sep 18 '08 at 02:19
4

Holy cow, you're turning Python into Perl with the magic underscore business. Just say no. – Parand Mar 13 '09 at 03:49

score 2 · Answer 13 · answered Dec 29 '10 at 17:11

Remove duplicates and preserve order:

This is a fast 2-liner that leverages built-in functionality of list comprehensions and dicts.

x = [1, 1, 2, 'a', 'a', 3]

tmpUniq = {} # temp variable used below 
results = [tmpUniq.setdefault(i,i) for i in x if i not in tmpUniq]

print results
[1, 2, 'a', 3]

The dict.setdefaults() function returns the value as well as adding it to the temp dict directly in the list comprehension. Using the built-in functions and the hashes of the dict will work to maximize efficiency for the process.

score 2 · Answer 14 · answered Sep 18 '08 at 02:06

Do the duplicates necessarily need to be in the list in the first place? There's no overhead as far as looking the elements up, but there is a little bit more overhead in adding elements (though the overhead should be O(1) ).

>>> x  = []
>>> y = set()
>>> def add_to_x(val):
...     if val not in y:
...             x.append(val)
...             y.add(val)
...     print x
...     print y
... 
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>>

score 1 · Answer 15 · answered Oct 21 '11 at 01:07

Here are two recipes from the itertools documentation:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

def unique_justseen(iterable, key=None):
    "List unique elements, preserving order. Remember only the element just seen."
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D
    return imap(next, imap(itemgetter(1), groupby(iterable, key)))

Wesley Tarle · Answer 16 · 2008-09-18T02:16:36.010

1

O(n) if dict is hash, O(nlogn) if dict is tree, and simple, fixed. Thanks to Matthew for the suggestion. Sorry I don't know the underlying types.

def unique(x):    
  output = []
  y = {}
  for item in x:
    y[item] = ""

  for item in x:
    if item in y:
      output.append(item)

  return output

edited Sep 18 '08 at 02:16

answered Sep 18 '08 at 01:35

Wesley Tarle

658
3
6

1

FYI, you can also do that with a set so you don't have to set it equal to an empty string. – Jason Baker Sep 18 '08 at 02:42

etchasketch · Answer 17 · 2008-09-18T04:28:42.590

1

has_key in python is O(1). Insertion and retrieval from a hash is also O(1). Loops through n items twice, so O(n).

def unique(list):
  s = {}
  output = []
  for x in list:
    count = 1
    if(s.has_key(x)):
      count = s[x] + 1

    s[x] = count
  for x in list:
    count = s[x]
    if(count > 0):
      s[x] = 0
      output.append(x)
  return output

edited Sep 18 '08 at 04:28

answered Sep 18 '08 at 04:07

etchasketch

606
4
6

Eli Courtwright · Answer 18 · 2008-11-10T21:56:05.853

1

There are some great, efficient solutions here. However, for anyone not concerned with the absolute most efficient O(n) solution, I'd go with the simple one-liner O(n^2*log(n)) solution:

def unique(xs):
    return sorted(set(xs), key=lambda x: xs.index(x))

or the more efficient two-liner O(n*log(n)) solution:

def unique(xs):
    positions = dict((e,pos) for pos,e in reversed(list(enumerate(xs))))
    return sorted(set(xs), key=lambda x: positions[x])

edited Nov 10 '08 at 21:56

answered Sep 18 '08 at 13:23

Eli Courtwright

186,300
67
213
256

That code is difficult to understand, and you say it's less efficient than the other solutions already presented here. So why would you go with it? – J Miller Sep 18 '08 at 17:40
1

I consider this easy to understand; passing a lambda function as the key parameter of sorted is really the canonical way to sort a list in Python. Most of my Python work involves generating reports on lists of statistics, and so to me this seems like the simplest and most Pythonic approach. – Eli Courtwright Sep 19 '08 at 12:51
While I agree your solution is succinct, the question asked for the fastest algorithm, not the most Pythonic. – J Miller Oct 01 '08 at 22:30

score 0 · Answer 19 · answered Sep 19 '08 at 09:42

0

If you take out the empty list from the call to set() in Terhost's answer, you get a little speed boost.

Change: found = set([])
to: found = set()

However, you don't need the set at all.

def unique(items):
    keep = []

    for item in items:
        if item not in keep:
            keep.append(item)

    return keep

Using timeit I got these results:

with set([]) -- 4.97210427363
with set() -- 4.65712377445
with no set -- 3.44865284975

answered Sep 19 '08 at 09:42

user18695

217
2
3

1

yeah, when you have few data, I bet the set internal mecanisme is slower that iterating over a list. But if you got maaaaaaaaaaany element, I think set are faster. Or what would be the point of this data structures ;-) – Bite code Sep 23 '08 at 21:21

score 0 · Answer 20 · edited Sep 21 '14 at 12:01

0

x = [] # Your list  of items that includes Duplicates

# Assuming that your list contains items of only immutable data types

dict_x = {} 

dict_x = {item : item for i, item in enumerate(x) if item not in dict_x.keys()}
# Average t.c. = O(n)* O(1) ; furthermore the dict comphrehension and generator like behaviour of enumerate adds a certain efficiency and pythonic feel to it.

x = dict_x.keys() # if you want your output in list format

edited Sep 21 '14 at 12:01

Lucas Zamboulis

2,494
5
24
27

answered Sep 21 '14 at 11:26

BigDataGuy

1

What can go wrong with items of mutable types? – ByteEater Mar 11 '21 at 00:36
1

This doesn't work the way you think it does. `if item not in dict_x.keys()` is checking the keys of the original, empty `dict_x`, not the dictionary being created. It's always true. The duplicates are removed simply because trying to create a duplicate key is ignored. – Barmar Mar 11 '21 at 00:38
1

Why are you using `enumerate()`? – Barmar Mar 11 '21 at 00:40
1

@ByteEater Mutable types can't be used as dictionary keys. – Barmar Mar 11 '21 at 00:42
@Bramar, maybe he tried to enable mutable types by writing `i: item` instead if `item : item` (although a single name for the pair would suffice) and then do `.values()` instead of `.keys()` in both places. But that won't work because of your first comment. – ByteEater Mar 11 '21 at 00:50

score 0 · Answer 21 · answered Sep 18 '08 at 01:29

I have no experience with python, but an algorithm would be to sort the list, then remove duplicates (by comparing to previous items in the list), and finally find the position in the new list by comparing with the old list.

Longer answer: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560

score 0 · Answer 22 · answered Sep 18 '08 at 01:32

0

>>> def unique(list):
...   y = []
...   for x in list:
...     if x not in y:
...       y.append(x)
...   return y

answered Sep 18 '08 at 01:32

etchasketch

606
4
6

3

To explain why: searching for x in a list structure (y) is O(n), while searching for x in a set (or dictionary) is O(1). – Hamish Downer Sep 27 '08 at 16:17

score -1 · Answer 23 · answered Sep 18 '08 at 01:54

-1

>>> x=[1,1,2,'a','a',3]
>>> y = [ _x for _x in x if not _x in locals()['_[1]'] ]
>>> y
[1, 2, 'a', 3]

"locals()['_[1]']" is the "secret name" of the list being created.

answered Sep 18 '08 at 01:54

Kevin Little

12,436
5
39
47

1

Presence of _[1] local is not guaranteed by language. – Constantin Sep 29 '08 at 15:20
1

" in " is O(n), so this is slow. – Charles Duffy Nov 24 '08 at 10:41

score -1 · Answer 24 · answered Sep 18 '08 at 08:51

-1

I don't know if this one is fast or not, but at least it is simple.

Simply, convert it first to a set and then again to a list

def unique(container):
  return list(set(container))

answered Sep 18 '08 at 08:51

Franck Mesirard

3,169
3
20
17

1

This does not preserve order. – Eli Courtwright Sep 18 '08 at 13:17

score -2 · Answer 25 · edited Oct 12 '21 at 12:22

-2

a=[1,2,3,4,5,7,7,8,8,9,9,3,45]

def unique(l):

    ids={}
    for item in l:
        if not ids.has_key(item):
            ids[item]=item
    return  ids.keys()
print a

print unique(a)

Inserting elements will take theta(n) retrieving if element is exiting or not will take constant time testing all the items will take also theta(n) so we can see that this solution will take theta(n). Bear in mind that dictionary in python implemented by hash table.

edited Oct 12 '21 at 12:22

vvvvv

25,404
19
49
81

answered Nov 11 '08 at 00:32

aboSamoor

31,331
3
17
10

The questions says "*while preserving order*". A Python dictionary doesn't preserve order. – jfs Nov 24 '08 at 11:28

Matthew Schinckel · Answer 26 · 2008-11-26T02:30:59.827

-2

I haven't done any tests, but one possible algorithm might be to create a second list, and iterate through the first list. If an item is not in the second list, add it to the second list.

x = [1, 1, 2, 'a', 'a', 3]
y = []
for each in x:
    if each not in y:
        y.append(each)

edited Nov 26 '08 at 02:30

answered Sep 18 '08 at 01:29

Matthew Schinckel

35,041
6
86
121

1

I find your use of the variable name "each" really confusing to read, probably because in many languages it is a keyword. It's much clearer to use item or just i. – rjmunro Nov 12 '08 at 01:00
'i' to me implies an index - we aren't iterating through indices, we are iterating through objects. I'd prefer item, but I don't see 'each' as bad - just because it is a keyword in another language, why prevent it's use here. Syntax highlighting (as shown above) picks it up fine... – Matthew Schinckel Nov 26 '08 at 02:30
Other than AppleScript, what languages use the word 'each' as a keyword? – Matthew Schinckel Nov 26 '08 at 02:33
You should have used a set. This is unlikely to be the fastest. – Marcin Nov 18 '12 at 16:38
Marcin: "... while preserving order". – Matthew Schinckel Nov 19 '12 at 01:15

score -2 · Answer 27 · answered Sep 18 '08 at 05:08

-2

One pass.

a = [1,1,'a','b','c','c']

new_list = []
prev = None

while 1:
    try:
        i = a.pop(0)
        if i != prev:
            new_list.append(i)
        prev = i
    except IndexError:
        break

answered Sep 18 '08 at 05:08

Sergey Stolyarov

2,587
3
27
40

1

Requires sorted input, doesn't it? – Constantin Sep 29 '08 at 15:20

In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

27 Answers27

Linked

Related

In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique *while preserving order*?

27 Answers27

Linked

Related

In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?