Python: Identical strings (or numbers) with unique ids?

Question

Python is wonderfully optimized, but I have a case where I'd like to work around it. It seems for small numbers and strings, python will automatically collapse multiple objects into one. For example:

>>> a = 1
>>> b = 1
>>> id(a) == id(b)
True
>>> a = str(a)
>>> b = str(b)
>>> id(a) == id(b)
True
>>> a += 'foobar'
>>> b += 'foobar'
>>> id(a) == id(b)
False
>>> a = a[:-6]
>>> b = b[:-6]
>>> id(a) == id(b)
True

I have a case where I'm comparing objects based on their Python ids. This is working really well except for the few cases where I run into small numbers. Does anyone know how to turn off this optimization for specific strings and integers? Something akin to an anti-intern()?

Literal values are implemented in CPython and they thus have they same id values. — Malik Brahimi, May 28 '15 at 23:43

score 1 · Answer 1 · edited May 23 '17 at 11:43

1

You can't turn it off without re-compiling your own version of CPython.

But if you want to have "separate" versions of the same small integers, you can do that by maintaining your own id (for example a uuid4) associated with the object.

Since ints and strings are immutable, there's no obvious reason to do this - if you can't modify the object at all, you shouldn't care whether you have the "original" or a copy because there is no use-case where it can make any difference.

Related: How to create the int 1 at two different memory locations?

edited May 23 '17 at 11:43

Community

1
1

answered May 28 '15 at 23:41

wim

338,267
99
616
750

Creating a uuid for each object would not allow for an analysis of references which is is the whole point of the exercise. – aviso Jul 02 '15 at 20:40

score 1 · Answer 2 · answered May 28 '15 at 23:41

1

You shouldn't be relying on these objects to be different objects at all. There's no way to turn this behavior off without modifying and recompiling Python, and which particular objects it applies to is subject to change without notice.

answered May 28 '15 at 23:41

user2357112

260,549
28
431
505

If we can't rely on ids then we shouldn't use 'is' in Python either. – aviso Jul 02 '15 at 20:45
@aviso: For these objects, you indeed shouldn't be using `is` either. – user2357112 Jul 02 '15 at 21:18
'is' is for comparing if two objects are the same, which is exactly the point of this exercise. – aviso Jul 02 '15 at 23:13

score 1 · Answer 3 · answered May 28 '15 at 23:48

1

Sure, it can be done, but its never really a good idea:

# 
Z =1

class MyString(string):
    def __init__(self, *args):
        global Z
        super(MyString, 
                  self).__init__(*args)
        self.i = Z
        Z += 1

>>> a = MyString("1")
>>> b = MyString("1")
>>> a is b
False

btw, to compare if objects have the same id just use a is b instead of id(a)==id(b)

answered May 28 '15 at 23:48

chown

51,908
16
134
170

This does not seem like a scalable solution. – aviso Jul 02 '15 at 20:44

score 1 · Answer 4 · edited May 23 '17 at 12:14

The Python documentation on id() says

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

CPython implementation detail: This is the address of the object in memory.

So it's guaranteed to be unique, it must be intended as a way to tell if two variables are bound to the same object.

In a comment on StackOverflow here, Alex Martelli says the CPython implementation is not the authoritative Python, and other correct implementations of Python can and do behave differently in some ways - and that the Python Language Reference (PLR) is the closest thing Python has to a definitive specification.

In the PLR section on objects it says much the same:

Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The ‘is‘ operator compares the identity of two objects; the id() function returns an integer representing its identity (currently implemented as its address).

The language reference doesn't say it's guaranteed to be unique. It also says (re: the object's lifetime):

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

and:

CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).

This isn't actually an answer, I was hoping this would end up somewhere conclusive. But I don't want to delete it now I've quoted and cited.

I'll go with turning your premise around: python will automatically collapse multiple objects into one. - no it willn't, they were never multiple objects, they can't be, because they have the same id().

If id() is Python's definitive answer on whether two objects are the same or different, your premise is incorrect - this isn't an optimization, it's a fundamental part of Python's view on the world.

Thanks for all the info. We can agree to disagree on if Python collapses multiple objects. I'm pretty sure that's exactly what happens at the C level. — aviso, Jul 02 '15 at 20:42

aviso · Answer 5 · 2015-07-03T15:06:51.000

This version accounts for wim's concerns about more aggressive internment in the future. It will use more memory, which is why I discarded it originally, but probably is more future proof.

>>> class Wrapper(object):
...     def __init__(self, obj):
...             self.obj = obj

>>> a = 1
>>> b = 1
>>> aWrapped = Wrapper(a)
>>> bWrapped = Wrapper(b)
>>> aWrapped is bWrapped
False
>>> aUnWrapped = aWrapped.obj
>>> bUnwrapped = bWrapped.obj
>>> aUnWrapped is bUnwrapped
True

Or a version that works like the pickle answer (wrap + pickle = wrapple):

class Wrapple(object):
    def __init__(self, obj):
        self.obj = obj

    @staticmethod
    def dumps(obj):
        return Wrapple(obj)

    def loads(self):
        return self.obj

aWrapped = Wrapple.dumps(a)
aUnWrapped = Wrapple.loads(a)

score -1 · Answer 6 · answered May 29 '15 at 03:31

-1

Well, seeing as no one posted a response that was useful, I'll just let you know what I ended up doing.

First, some friendly advice to someone who might read this one day. This is not recommended for normal use, so if you're contemplating it, ask yourself if you have a really good reason. There are good reason, but they are rare, and if someone says there aren't, they just aren't thinking hard enough.

In the end, I just used pickle.dumps() on all the objects and passed the output in instead of the real object. On the other side I checked the id and then used pickle.loads() to restore the object. The nice part of this solution was it works for all types including None and Booleans.

>>> a = 1
>>> b = 1
>>> a is b
True
>>> aPickled = pickle.dumps(a)
>>> bPickled = pickle.dumps(b)
>>> aPickled is bPickled
False
>>> aUnPickled = pickle.loads(aPickled)
>>> bUnPickled = pickle.loads(bPickled)
>>> aUnPickled is bUnPickled
True
>>> aUnPickled
1

answered May 29 '15 at 03:31

aviso

2,371
1
14
15

1

But then `aPickledAgain = pickle.dumps(a); aPickled is aPickledAgain` gives `False` too. This is an incredibly kludgy version of wim's suggestion. – user2357112 Jul 02 '15 at 21:23
Yes, that's exactly as expected. The point is to follow an object's references, not their values. Pickling that object lets it be unique where it wouldn't otherwise be. If you pickle it again, you're asking for another unique object. This is not an implementation of wim's suggestion as he too was concerned about the value rather than the object. – aviso Jul 02 '15 at 23:12
Agree it's kludgy, because you're relying on the assumption that those bytestrings generated by `pickle.dumps` won't be interned (which isn't guaranteed by implementation, and your code can break in the future without any warning) – wim Jul 03 '15 at 07:37
Then provide a better solution. Yours doesn't handle new and arbitrary object types. I see your point about internment. I had a different idea before the pickle one that would handle internment better. I'll put that as another answer. – aviso Jul 03 '15 at 11:48

Python: Identical strings (or numbers) with unique ids?

6 Answers6