16

I have a list of objects, and I want to filter the list in a way that as a result there is only one occurence of each attribute value.

For instance, let's say I have three objects

obj1.my_attr = 'a'
obj2.my_attr = 'b'
obj3.my_attr = 'b'

obj_list = [obj1, obj2, obj3]

And and the end, I want to get [obj1, obj2]. Actually order does not matter, so [obj1, obj3] is exactly as good.

First I thought of the typical imperative clunky ways like following:

record = set()
result = []

for obj in obj_list:
    if obj.my_attr not in record:
        record.add(obj.my_attr)
        result.append(obj)

Then I though of maping it to a dictionary, use the key to override any previous entry and finally extract the values:

result = {obj.my_attr: obj for obj in obj_list}.values() 

This one looks good, but I would like to know if there any more elegant, efficient or functional way of achieving this. Maybe some sweet thing hidden in the standard library... Thanks in advance.

bgusach
  • 14,527
  • 14
  • 51
  • 68

2 Answers2

11

If you want to use a functional programming style in Python, you may want to check out the toolz package. With toolz, you could simply do:

toolz.unique(obj_list, key=lambda x: x.my_attr)

For better performance, you could use operator.attrgetter('my_attr') instead of the lambda function for the key. You could also use cytoolz, which is a fast implementation of toolz written in Cython.

eriknw
  • 286
  • 1
  • 2
2

You could use an object that would define a custom __hash__ function:

class HashMyAttr:
    def __init__(self, obj):
        self.obj = obj
    def __hash__(self):
        return self.obj.my_attr.__hash__()
    def __eq__(self, other):
         return self.obj.my_attr == other.obj.my_attr

And use it like:

obj_list = [x.obj for x in set(HashMyAttr(obj) for obj in obj_list)]
njzk2
  • 38,969
  • 7
  • 69
  • 107
  • Not quite. Doesn't work if the attribute is an `int`: `AttributeError: 'int' object has no attribute '__eq__'` – SiHa Jul 07 '14 at 15:59
  • 1
    in this case the attribute appears to be a string, but I guess `==` would work, then. – njzk2 Jul 07 '14 at 16:00
  • This definitely works and is an interesting approach, but I find it a little bit of overkill and quite verbose. I would prefer to use the dictionary comprehension instead (most probably more efficient, since it does not need to construct wrappers). – bgusach Jul 07 '14 at 16:05
  • If you can override the `__eq__` and the `__hash__` functions directly in the original class of your objects, then you don't need most of the overhead here. – njzk2 Jul 07 '14 at 16:07
  • @njzk2 and what if I need to filter later by another attribute? or I have already a `__hash__` implementation? – bgusach Jul 07 '14 at 16:08
  • that's why I offered this alternative using a wrapping class. You could add the name of the attribute in the constructor and use getattr, or pass a lambda or use a function, to have something more flexible. – njzk2 Jul 07 '14 at 16:13