0

I want to derive a class from list, add a few instance attributes to it, and make it hashable. What is a good (fast and neat) way to do it?

UPDATE:

I deleted a lengthy explanation of a use case. I also moved a related but separate issue into a different question.

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355
  • It's hard to tell exactly what you're asking. – Joel Cornett Apr 20 '12 at 21:30
  • Why can't you use tuples? If it's only about having named attributes, there's [`namedtuple`](http://docs.python.org/py3k/library/collections.html#collections.namedtuple). –  Apr 20 '12 at 21:31
  • Can you use `hash(tuple(self))`? – Reinstate Monica Apr 20 '12 at 21:32
  • @JoelCornett: How to write a hash function for the class derived from list, which also contains other attributes. – max Apr 20 '12 at 21:37
  • @delnan: I don't know the values of the `tuple` until I'm in the `__init__` method. By then it's too late to tell the base `tuple` what I want it to contain. – max Apr 20 '12 at 21:38
  • @WolframH: no because I have other attributes, besides the `tuple`. They would be lost when converting to `tuple`. – max Apr 20 '12 at 21:40
  • @max: I'll write it out more explicitly: If you inherit from `list`, let `__hash__` return `hash((self.other_attribute, tuple(self)))`. – Reinstate Monica Apr 20 '12 at 21:43
  • @WolframH I didn't realize this would work. Thank you. Is conversion to `tuple` time-consuming? – max Apr 20 '12 at 21:46
  • Are you sure you can't just subtype tuple? You should be able to do most everything in new that you can do in init. – Bi Rico Apr 20 '12 at 21:54
  • @Bago I wanted to save in the instance attributes some interim results from the calculations I perform to obtain the list/tuple values. These instance attributes aren't available in `__new__` because `__new__` is a static method. So I would have to repeat those calculations later in `__init__`. – max Apr 20 '12 at 21:58

3 Answers3

1

This code is fine. You're making a copy of the list, which could be a bit slow.

def __hash__(self):
    return hash(tuple(self.list_attribute))

You have several options if you want to be faster.

  • Store list_attribute as a tuple, not a list (after it is fully constructed)
  • Compute the hash once at init time and store the hash value. You can do this because your class is immutable, so the hash will never change.
  • Write your own hash function. Here's the hash function for tuple, do something similar.
Keith Randall
  • 22,985
  • 2
  • 35
  • 54
  • Given that I now shortened the question to only focus on the subclass case, the option of a tuple disappears (since it cannot be changed in `__init__`). Computing at `__init__` time is a great idea - but I wonder, isn't it a good idea *always*, not just in this case? Still, it would be slower than the third option of course. – max Apr 20 '12 at 21:52
  • It all depends on how many times the object will be hashed. If the common case is 0, then you don't want to do it ahead of time. If the common case is *many*, then precomputing is best. – Keith Randall Apr 20 '12 at 22:24
  • Storing the hash also wastes memory, btw. – Keith Randall Apr 21 '12 at 01:38
1

You can apply tuple to self:

class State(list):
    def __hash__(self):
        return hash((self.some_attribute, tuple(self)))

tuple-ing self takes about half the time of the whole hashing process:

from timeit import timeit

setup = "from __main__ import State; s = State(range(1000)); s.some_attribute = 'foo'"
stmt = "hash(s)"
print(timeit(stmt=stmt, setup=setup, number=100000))

setup = "r = list(range(1000))"
stmt = "tuple(r)"
print(timeit(stmt=stmt, setup=setup, number=100000))

prints

0.9382011891054844
0.3911763069244216
Reinstate Monica
  • 4,568
  • 1
  • 24
  • 35
1

This is more of a comment than an answer, but it's too long to be a comment. This is how one can accesses instance attributes from inside __new__:

class Data(tuple):
    def __new__(klass, arg):
        data_inst = tuple.__new__(klass, arg)
        data_inst.min = min(data_inst)
        data_inst.max = max(data_inst)
        return data_inst

>>> d = Data([1,2,3,4])
>>> d
(1, 2, 3, 4)
>>> d.min
1
>>> d.max
4
>>> d1 = Data([1,2,3,4,5,6])
>>> d1.max
6
>>> d.max
4
Bi Rico
  • 25,283
  • 3
  • 52
  • 75
  • But these would be class attributes, not instance attribute. Try `d1 = Data([1,2,3,4])`, and then you'll see that `d1.min is d.min` evaluates to True. You cannot separate them between instances. – max Apr 20 '12 at 22:28
  • I don't think so, try it – Bi Rico Apr 20 '12 at 22:29
  • Ahh you're right. My bad, `is` only evaluated to True because they were small numbers which reused the same `int` object. Neat! I would need to store the interim calculations until *after* I call `tuple.__new__`, but it's fine I guess. – max Apr 20 '12 at 22:42