4

How to uniq list of objects in Python, saving order ?

def Test(object):
    def __init__(self,p1,p2):
        self.p1 = p1
        self.p2 = p2
lst = [Test(1,2), Test(2,3), Test(1,2)]

Two object uniq, if

Test1.p1 == Test2.p1 and Test1.p1 == Test2.p2
Bdfy
  • 23,141
  • 55
  • 131
  • 179
  • 1
    Define uniqueness? is it `p1` same in both or `p2`? – Aamir Rind Dec 20 '13 at 13:06
  • @AamirAdnan based on the context I assume he wants an ordered list that only contains unique elements. – maxywb Dec 20 '13 at 13:07
  • cast both as sets `p1 = set(p1)` then `p1 = p1.union(p2)` would give a set containing all unique. then sort it. [set](http://docs.python.org/2/library/sets.html) – Felix Castor Dec 20 '13 at 13:11

6 Answers6

5
class Test(object):
    def __init__(self,p1,p2):
        self.p1 = p1
        self.p2 = p2

    def __eq__(self, other):
        return (other.p1 == self.p1) and (other.p2 == self.p2)

    def __hash__(self):
        return (self.p1 << 64) | self.p2

lst = [Test(1,2), Test(2,3), Test(1,2)]
from collections import OrderedDict
uniq = list(OrderedDict.fromkeys(lst, 0))
print [[item.p1, item.p2] for item in uniq]
  1. If we use the objects in hashable collections, we should define __hash__ and __eq__ functions.

  2. I have used (self.p1 << 64) | self.p2 as hash, with the assumption that the numbers p1 and p2 will not exceed 2^64 (18446744073709551616).

  3. This works but don't do this. The class which you have created is mutable, which means the state of the object can be changed (basically you ll be changing p1 and p2). If the state of the object can change, the hash value will also change. As you see, we rely on __hash__ to store the object in OrderedDict.

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
  • I just thought about the same solution and wasn't sure how to build the hash value. Shifting the value of p1 might work, depending on the expected values. – Matthias Dec 20 '13 at 13:19
1

I'm changing my answer to preserve order. You can define just equality (by adding an __eq__ method) and append your items one by one into a new list, while checking if they are already present:

class Test(object):
    def __init__(self,p1,p2):
        self.p1 = p1
        self.p2 = p2

    def __eq__(self, ot):
        return self.p1 == ot.p1 and self.p2 == ot.p2


lst = [Test(1,2), Test(2,3), Test(1,2)]
new_lst = []
for x in lst:
    if x not in new_lst:
        new_lst.append(x)
GermanK
  • 1,676
  • 2
  • 14
  • 21
0

Using collections.OrderedDict:

class Test(object):
    def __init__(self, p1, p2):
        self.p1 = p1
        self.p2 = p2

lst = [Test(1,2), Test(2,3), Test(1,2)]


import collections
d = collections.OrderedDict()
for x in lst:
    key = x.p1, x.p2
    if key not in d:
        d[key] = x

for test_item in d.values():
    print(test_item.p1, test_item.p2)

prints

1 2
2 3
falsetru
  • 357,413
  • 63
  • 732
  • 636
0

Alternatively, with a generator that keeps track of the keys it's already seen using a set:

def unique_values(iterable):
    seen = set()
    for value in iterator:
        key = (value.p1, value.p2)
        if key not in seen:
            yield value
            seen.add(key)

lst = list(unique_values(lst))
RemcoGerlich
  • 30,470
  • 6
  • 61
  • 79
0

As a fan of list comprehension, I must share this piece:

seen = set()
uniq_list = [t for t in lst if (t.p1, t.p2) not in seen and not seen.add((t.p1, t.p2))]
  1. "(t.p1, t.p2) not in seen" will be true for new occurences hence continue to next part only when it's new.
  2. "not seen.add((t.p1, t.p2))" is always True and adds the element to seen.
Gilad
  • 121
  • 1
  • 8
-1

You could do something that feels hacky, but should work for you:

tmpset = set(lst)
uniqsorted = list(tmpset).sort()
maxywb
  • 2,275
  • 1
  • 19
  • 25