4

I encounter many tasks in which I need to filter python (2.7) list to keep only ordered unique values. My usual approach is by using odereddict from collections:

from collections import OrderedDict

ls = [1,2,3,4,1,23,4,12,3,41]

ls = OrderedDict(zip(ls,['']*len(ls))).keys()

print ls

the output is:

[1, 2, 3, 4, 23, 12, 41]

is there any other state of the art method to do it in Python?

  • Note - the input and the output should be given as list

edit - a comparison of the methods can be found here: https://www.peterbe.com/plog/uniqifiers-benchmark

the best solution meanwhile is:

def get_unique(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]
dreftymac
  • 31,404
  • 26
  • 119
  • 182
Dimgold
  • 2,748
  • 5
  • 26
  • 49

4 Answers4

5

You could use a set like this:

newls = []
seen = set()

for elem in ls:
    if not elem in seen:
        newls.append(elem)
        seen.add(elem)
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
  • sorry, but it looks even more complicated (for loop, two additional memory structures) than the original. – Dimgold Jun 19 '17 at 10:37
  • 1
    there is no need for the set – Netwave Jun 19 '17 at 10:39
  • 1
    @Dimgold: yeah, it's a bit more verbose, but it doesn't require any `import`s and may well be more efficient than using `OrderedDict.keys` – Eugene Yarmash Jun 19 '17 at 10:42
  • can you elaborate about the complexity? I thinks they both are **O(n)** – Dimgold Jun 19 '17 at 10:44
  • 1
    @Dimgold - not all O's are born equal, your solution has an overhead of creating a doubly-linked list (and on the Python side as well). This one has some unnecessary function calls but I'm pretty certain the total amount of overhead is significantly lower than what you're using. Not even going into the memory overhead... – zwer Jun 19 '17 at 10:54
  • @Netwave (old comment, but thought I'd answer it), there is a need for the set. it will only be added to the set if it's not already there. if it's not, also add to the list - which keeps it ordered. so check with the set, and use in the end with the list (the set is a dummy thing there, kind of - since it's the one taking care of checking if the element already exists or not) – Edw590 Mar 14 '23 at 23:35
3

If you need to preserve the order and get rid of the duplicates, you can do it like:

ls = [1, 2, 3, 4, 1, 23, 4, 12, 3, 41]

lookup = set()  # a temporary lookup set
ls = [x for x in ls if x not in lookup and lookup.add(x) is None]
# [1, 2, 3, 4, 23, 12, 41]

This should be considerably faster than your approach.

zwer
  • 24,943
  • 3
  • 48
  • 66
0

Define a function to do so:

def uniques(l):
    retl = []
    for x in l:
        if x not in retl:
            retl.append(x)
    return retl
ls = [1,2,3,4,1,23,4,12,3,41]
uniques(ls)
[1, 2, 3, 4, 23, 12, 41]
Netwave
  • 40,134
  • 6
  • 50
  • 93
  • pretty sure than this algorithm is at least **O(n^2)** (not counting the append ) while the original is **O(n)** – Dimgold Jun 19 '17 at 10:41
  • @Dimgold, actually is **O(n*log(n))** I think, but you avoid creating 3 intermediate structures. – Netwave Jun 19 '17 at 10:44
  • why log? it's not sorted. I'm trying to figure out where are the bottlenecks. I'm pretty sure that from algorithmic point of view - @eugene proposal is the most efficient – Dimgold Jun 19 '17 at 10:46
  • @Dimgold, eugene is iterating over the same im doing, but creating a set, the lookup will be quicker probably for large list but the memory overhead will be larger too. if you print intermediate results from my retl and his lookup set is the same. – Netwave Jun 19 '17 at 11:12
0

Another solution would be using list comprehension like this:

[x for i, x in enumerate(ls) if x not in ls[:i]]

Output:

[1, 2, 3, 4, 23, 12, 41]
Carles Mitjans
  • 4,786
  • 3
  • 19
  • 38
  • pretty sure than this algorithm is at least **O(n^2)** (not counting the append ) while the original is **O(n)** – Dimgold Jun 19 '17 at 10:42