0

Is there a way to make a user defined class that operates like int in that any equal instances have the same referent?

E.g:

>>> a = 2
>>> b = 2
>>> a == b
True
>>> a is b
True

But with a user defined class like this one:

class Variable:
def __init__(self, letter, index):
    self.letter = letter
    self.index = int(index)

def __str__(self):
    return self.letter + '_' + str(self.index)

we have the following:

>>> a = Variable('x',1)
>>> b = Variable('x',1)
>>> a == b
True
>>> a is b
False
Bill
  • 640
  • 2
  • 8
  • 18
  • after reading this question https://stackoverflow.com/questions/11611750/under-which-circumstances-do-equal-strings-share-the-same-reference I understand that this behavior is implementation dependent for strings. I don't believe that's the case for integers though. – Bill Jun 27 '17 at 19:22
  • 1
    You are being led astray by optimizations that are implementation details of CPython, that is, small-int caching and string interning. – juanpa.arrivillaga Jun 27 '17 at 19:23
  • 1
    It *is* implementation specific for `int`s. In fact, it only holds true for integers from `-5` to `256`. Again, it is an optimization. See [this](https://stackoverflow.com/questions/306313/is-operator-behaves-unexpectedly-with-integers) question. There is also the peephole optimization at play with literals. – juanpa.arrivillaga Jun 27 '17 at 19:24
  • 1
    You want to implement `__new__`. See https://stackoverflow.com/questions/674304/pythons-use-of-new-and-init for example. – Alex Hall Jun 27 '17 at 19:25
  • You could do it by implementing `__new__` with a mapping of parameters to instances, and always returning the same instance for the same inputs. However that won't cover you if your instances are mutable. – jonrsharpe Jun 27 '17 at 19:25
  • 6
    *in that any equal instances have the same referent*: `x = 600; y = 601` then `x is (y - 1)` is `False`. You are misunderstanding what is happening. – Martijn Pieters Jun 27 '17 at 19:25
  • If you're going to do that, this would complicate your objects and believe it or not, this would not help in terms of performance. Python optimizes internally and leave it that way. – GIZ Jun 27 '17 at 20:39

2 Answers2

2

Is there a way to make a user defined class that operates like int in that any equal instances have the same referent?

First of all, only a limited number of integers behave that way; small integers are interned for performance and memory efficiency reasons (see "is" operator behaves unexpectedly with integers).

What you are asking for is how to ensure your own instances are interned, in that there is only ever one copy of an instance for a given 'value'. You can do that by controlling when a new instance is created, by implementing your own __new__ method:

class Variable:
    _instances = {}

    def __new__(cls, letter, index):
        index = int(index)
        try:
            # return existing instance
            return cls._instances[letter, index]
        except KeyError:
            # no instance yet, create a new one
            instance = super().__new__(cls)
            instance._letter = letter
            instance._index = index
            cls._instances[letter, index] = instance
            return instance

    def __str__(self):
        return self._letter + '_' + str(self._index)

For a given letter and index combo, just one instance is created:

>>> a = Variable('a', 1)
>>> b = Variable('a', 1)
>>> a
<__main__.Variable object at 0x10858ceb8>
>>> b
<__main__.Variable object at 0x10858ceb8>
>>> a is b
True

This is essentially how integer interning works too.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
1

Martijn Pieters' answer is as close as you're going to get to an answer useful for practical purposes (got my upvote), but I was interested in johnrsharpe's point about mutability. For instance, using Martijn's solution, the following fails:

a = Variable('x', 0)
b = Variable('x', 0)
c = Variable('y', 0)
a.letter = c.letter
assert(a is c)

We want equal instances to always refer to the same object in memory. This is very tricky, requires some black magic, and should never ever ever be used in a real application, but is in some sense possible. So, if you're in it for the laughs, come along for the ride.

My first thought was that we need to overload __setattr__ for Variable so that when an attribute changes, a new instance with the appropriate attribute values is created and all references (Footnote 1) to the original instance are updated to point to this new instance. This is possible with pyjack, but it turns out not to give us quite the right solution. If we do the following:

a = Variable('x', 0)
b = Variable('x', 0)
a.letter = 'y'

and in the process of that last assignment update all references to the object referred to as a, then b will also end up with b.letter == 'y' since a and b (obviously) refer to the same instance.

So, it's not a matter of updating all references to the Variable instance. It's a matter of updating the one reference we just changed. That is to say, for the namespace in which the attribute assignment was called, we need to update the locals to point to the new instance. This is not straightforward, but here is a method that works with all tests I could come up with. Note that this code does not have so much of a code smell as a full-on corpse-in-the-closet-for-three-days code reek about it. Again, do not use it for anything serious:

import inspect
import dis

class MutableVariable(object):
    __slots__ = ('letter', 'index')  # Prevent access through __dict__
    previously_created = {}

    def __new__(cls, letter, index):
        if (letter, index) in cls.previously_created:
            return cls.previously_created[(letter, index)]
        else:
            return super().__new__(cls)

    def __setattr__(self, name, value):
        letter = self.letter
        index = self.index
        if name == "letter":
            letter = value
        elif name == "index":
            index = int(value)
        # Get bytecode for frame in which attribute assignment occurred
        frame = inspect.currentframe()
        bcode = dis.Bytecode(frame.f_back.f_code)
        # Get index of last executed instruction
        last_inst = frame.f_back.f_lasti
        # Get locals dictionary from namespace in which assignment occurred
        call_locals = frame.f_back.f_locals
        assign_name = []
        attribute_name = []
        for instr in bcode:
            if instr.offset > last_inst:  # Only go to last executed instruction
                break
            if instr.opname == "POP_TOP":  # Clear if popping stack
                assign_name = []
                attribute_name = []
            elif instr.opname == "LOAD_NAME":  # Keep track of name loading on stack
                assign_name.append(instr.argrepr)
            elif instr.opname == "LOAD_ATTR":  # Keep track of attribute loading on stack
                attribute_name.append(instr.argrepr)
            last_instr = instr.opname  # Opname of last executed instruction
        try:
            name_index = assign_name.index('setattr') + 1  # Check for setattr call
        except ValueError:
            if last_instr == 'STORE_ATTR':  # Check for direct attr assignment
                name_index = -1
            else:  # __setattr__ called directly
                name_index = 0
        assign_name = assign_name[name_index]
        # Handle case where we are assigning to attribute of an attribute

        try:
            attributes = attribute_name[attribute_name.index(name) + 1: -1]
            attribute_name = attribute_name[-1]
        except (IndexError, ValueError):
            attributes = []
        if len(attributes):
            obj = call_locals[assign_name]
            for attribute_ in attributes:
                obj = getattr(obj, attribute_)
            setattr(obj, attribute_name, MutableVariable(letter, index))
        else:
            call_locals[assign_name] = MutableVariable(letter, index)

    def __init__(self, letter, index):
        super().__setattr__("letter", letter)  # Use parent's setattr on instance initialization
        super().__setattr__("index", index)
        self.previously_created[(letter, index)] = self

    def __str__(self):
        return self.letter + '_' + str(self.index)

# And now to test it all out...
if __name__ == "__main__":
    a = MutableVariable('x', 0)
    b = MutableVariable('x', 0)
    c = MutableVariable('y', 0)
    assert(a == b)
    assert(a is b)
    assert(a != c)
    assert(a is not c)

    a.letter = c.letter
    assert(a != b)
    assert(a is not b)
    assert(a == c)
    assert(a is c)

    setattr(a, 'letter', b.letter)
    assert(a == b)
    assert(a is b)
    assert(a != c)
    assert(a is not c)

    a.__setattr__('letter', c.letter)
    assert(a != b)
    assert(a is not b)
    assert(a == c)
    assert(a is c)

    def x():
        pass

    def y():
        pass

    def z():
        pass

    x.testz = z
    x.testz.testy = y
    x.testz.testy.testb = b
    x.testz.testy.testb.letter = c.letter
    assert(x.testz.testy.testb != b)
    assert(x.testz.testy.testb is not b)
    assert(x.testz.testy.testb == c)
    assert(x.testz.testy.testb is c)

So, basically what we do here is use dis to analyze the bytecode for the frame in which the assignment occurred (as reported by inspect). Using this, we extract the name of the variable referencing the MutableVariable instance undergoing attribute assignment, and update the locals dictionary for the corresponding namespace so that that variable references a new MutableVariable instance. None of this is a good idea.

The code shown here is almost certainly implementation specific and may be the most fragile piece of code I've ever written, but it does work on standard CPython 3.5.2.

Footnote 1: Note that here, I am not using reference in the formal (e.g. C++) sense (since Python is not pass by reference) but in the sense of a variable referring to a particular object in memory. i.e. in the sense of "reference counting" not "pointers vs. references."

wphicks
  • 355
  • 2
  • 9
  • yikes! this answer is helpful if for no other reason than that it fully and utterly convinced me that this is something I do NOT want to do :D thanks – Bill Jun 28 '17 at 15:31
  • Just to make it a little scarier... After sleeping on it I realized one more thing I didn't cover in my tests that would almost certainly break this- if the reference being updated is from some other data structure. E.g. `mylist = [a]` and then `mylist[0].letter = b.letter`. – wphicks Jun 28 '17 at 15:35