Pool of hashable objects

Question

I've made a highly recursive, hashable (assumed immutable) datastructure. Thus it would be nice to have only one instance of each object (if objectA == objectB, then there is no reason not to have objectA is objectB).

I have tried solving it by defining a custom __new__(). It creates the requested object, then checks if it is in a dictionary (stored as a class variable). The object is added to the dict if necessary and then returned. If it is already in the dict, the version in the dict is returned and the newly created instance passes out of scope.

This solution works, but

I have to have a dict where the value at each key is the same object. What I really need is to extract an object from a set when I "show" the set an equal object. Is there a more elegant way of doing this?
Is there a builtin/canonical solution to my problem in Python? Such as a class I can inherit from or something....

My current implementation is along these lines:

class NoDuplicates(object):
    pool = dict()
    def __new__(cls, *args):
        new_instance = object.__new__(cls)
        new_instance.__init__(*args)
        if new_instance in cls.pool:
            return cls.pool[new_instance]
        else:
            cls.pool[new_instance] = new_instance
            return new_instance

I am not a programmer by profession, so I suspect this corresponds to some well known technique or concept. The most similar concepts that come to mind are memoization and singleton.

One subtle problem with the above implementation is that __init__ is always called on the return value from __new__. I made a metaclass to modify this behaviour. But that ended up causing a lot of trouble since NoDuplicates also inherits from dict.

"What I really need is to extract an object from a set when I "show" the set an equal object." I'm not sure what you mean by this. — TheSoundDefense, Sep 02 '14 at 14:52
Indeed, memoization is a very similar concept. Its scope is narrower but that might be appropriate for the majority of use cases. — Brian Cain, Sep 02 '14 at 15:33
You are right, @BrianCain, my implementation is very close to just memoizing the output of `__new__`. — Henrik, Sep 02 '14 at 16:06

score 2 · Accepted Answer · edited May 23 '17 at 12:12

2

First, I would use a factory instead of overriding __new__. See Python's use of __new__ and __init__?.

Second, you can use tuples of arguments needed to create an object as dictionary keys (if same arguments produce same objects, of course), so you won't need to create an actual (expensive to create) object instance.

edited May 23 '17 at 12:12

Community

1
1

answered Sep 02 '14 at 15:29

Anton Savin

40,838
8
54
90

Sounds like a good idea. Using a factory might also relieve me of using metaclasses to avoid calling `__init__` all the time. – Henrik Sep 02 '14 at 16:15
By the way, Im new here. When I come up with a better implementation, how should I post it? As a seperate answer? as an edit to your answer? in the original question? I don't want to screw up, steal "points" or offend anyone... – Henrik Sep 02 '14 at 16:17
@Henrik Yes, with the factory you won't have to call special methods. Regarding self-answering you can read this: http://stackoverflow.com/help/self-answer. (In short: no problem if you accept your own answer) – Anton Savin Sep 02 '14 at 17:48

Pool of hashable objects

1 Answers1