2

Possible Duplicates:
How do you remove duplicates from a list in Python whilst preserving order?
Algorithm - How to delete duplicate elements in a list efficiently?

I've read a lot of methods for removing duplicates from a python list while preserving the order. All the methods appear to require the creation of a function/sub-routine, which I think is not very computationally efficient. I came up with the following and I would like to know if this is the most computationally efficient method to do so? (My usage for this has to be the most efficient possible due to the need to have fast response time.) Thanks

b=[x for i,x in enumerate(a) if i==a.index(x)]
Community
  • 1
  • 1
user918081
  • 65
  • 10
  • 1
    Does it REALLY matter if they stay ordered? If they have to, you're going to be computationally expensive. If you can give up on ordering, just throw the items in a set and turn that back into a list. – Tyler Eaves Aug 29 '11 at 15:31

2 Answers2

6

a.index(x) itself will be O(n) as the list has to be searched for the value x. The overall runtime is O(n^2).

"Saving" function calls does not make a bad algorithm faster than a good one.

More efficient (O(n)) would probably be:

result = []
seen = set()
for i in a:
    if i not in seen:
        result.append(i)
        seen.add(i)

Have a look at this question: How do you remove duplicates from a list in whilst preserving order?

(the top answer also shows how to do this in a list comprehension manner, which will be more efficient than an explicit loop)


You can easily profile your code yourself using the timeit [docs] module. For example, I put your code in func1 and mine in func2. If I repeat this 1000 times with an array with 1000 elements (no duplicates):

>>> a = range(1000)
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
11.691882133483887
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.3130321502685547

Now with duplicates (only 100 distinct values):

>>> a = [random.randint(0, 99) for _ in range(1000)]
>>> timeit.timeit('func1(a)', 'from __main__ import func1, a', number=1000)
2.5020430088043213
>>> timeit.timeit('func2(a)', 'from __main__ import func2, a', number=1000)
0.08332705497741699
Community
  • 1
  • 1
Felix Kling
  • 795,719
  • 175
  • 1,089
  • 1,143
1
lst = [1, 3, 45, 8, 8, 8, 9, 10, 1, 2, 3]
dummySet = set()
[(i, dummySet.add(i))[0] for i in lst if i not in dummySet]
Martin
  • 5,954
  • 5
  • 30
  • 46