Why is the id of a Python class not unique when called quickly?

Question

I'm doing some things in Python (3.3.3), and I came across something that is confusing me since to my understanding classes get a new id each time they are called.

Lets say you have this in some .py file:

class someClass: pass

print(someClass())
print(someClass())

The above returns the same id which is confusing me since I'm calling on it so it shouldn't be the same, right? Is this how Python works when the same class is called twice in a row or not? It gives a different id when I wait a few seconds but if I do it at the same like the example above it doesn't seem to work that way, which is confusing me.

>>> print(someClass());print(someClass())
<__main__.someClass object at 0x0000000002D96F98>
<__main__.someClass object at 0x0000000002D96F98>

It returns the same thing, but why? I also notice it with ranges for example

for i in range(10):
    print(someClass())

Is there any particular reason for Python doing this when the class is called quickly? I didn't even know Python did this, or is it possibly a bug? If it is not a bug can someone explain to me how to fix it or a method so it generates a different id each time the method/class is called? I'm pretty puzzled on how that is doing it because if I wait, it does change but not if I try to call the same class two or more times.

score 52 · Accepted Answer · edited Sep 24 '19 at 20:39

The id of an object is only guaranteed to be unique during that object's lifetime, not over the entire lifetime of a program. The two someClass objects you create only exist for the duration of the call to print - after that, they are available for garbage collection (and, in CPython, deallocated immediately). Since their lifetimes don't overlap, it is valid for them to share an id.

It is also unsuprising in this case, because of a combination of two CPython implementation details: first, it does garbage collection by reference counting (with some extra magic to avoid problems with circular references), and second, the id of an object is related to the value of the underlying pointer for the variable (ie, its memory location). So, the first object, which was the most recent object allocated, is immediately freed - it isn't too surprising that the next object allocated will end up in the same spot (although this potentially also depends on details of how the interpreter was compiled).

If you are relying on several objects having distinct ids, you might keep them around - say, in a list - so that their lifetimes overlap. Otherwise, you might implement a class-specific id that has different guarantees - eg:

class SomeClass:
    next_id = 0

    def __init__(self):
         self.id = SomeClass.nextid
         SomeClass.nextid += 1

Nice explanation, but one minor quibble. The way it's written implies that the memory is actually getting `free`d and then `malloc`d (or some equivalent), when really it's not even getting outside of Python's PyObject free-list, and _that's_ why it happens so consistently (subject to your well-explained caveats), even across platforms or with debug mallocs and so on. — abarnert, Dec 24 '13 at 01:04
Base `object` `tp_dealloc` calls the [heap type's `tp_free`](http://hg.python.org/cpython/file/c3896275c0f6/Objects/typeobject.c#l2370), which is [`PyObject_GC_Del`](http://hg.python.org/cpython/file/c3896275c0f6/Modules/gcmodule.c#l1621). That in turn uses the macro `PyObject_FREE`. The caveat, regarding how CPython is compiled, is that [without pymalloc](http://hg.python.org/cpython/file/c3896275c0f6/Include/objimpl.h#l133) the macro `PyObject_FREE` is defined as `PyMem_FREE`, which for a non-debug build is just `free`. So at that point address reuse depends on the platform `malloc`. — Eryk Sun, Dec 24 '13 at 02:19

abarnert · Answer 2 · 2013-12-24T01:09:24.110

If you read the documentation for id, it says:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

And that's exactly what's happening: you have two objects with non-overlapping lifetimes, because the first one is already out of scope before the second one is ever created.

But don't trust that this will always happen, either. Especially if you need to deal with other Python implementations, or with more complicated classes. All that the language says is that these two objects may have the same id() value, not that they will. And the fact that they do depends on two implementation details:

The garbage collector has to clean up the first object before your code even starts to allocate the second object—which is guaranteed to happen with CPython or any other ref-counting implementation (when there are no circular references), but pretty unlikely with a generational garbage collector as in Jython or IronPython.
The allocator under the covers have to have a very strong preference for reusing recently-freed objects of the same type. This is true in CPython, which has multiple layers of fancy allocators on top of basic C malloc, but most of the other implementations leave a lot more to the underlying virtual machine.

One last thing: The fact that the object.__repr__ happens to contain a substring that happens to be the same as the id as a hexadecimal number is just an implementation artifact of CPython that isn't guaranteed anywhere. According to the docs:

If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description…> should be returned.

The fact that CPython's object happens to put hex(id(self)) (actually, I believe it's doing the equivalent of sprintf-ing its pointer through %p, but since CPython's id just returns the same pointer cast to a long that ends up being the same) isn't guaranteed anywhere. Even if it has been true since… before object even existed in the early 2.x days. You're safe to rely on it for this kind of simple "what's going on here" debugging at the interactive prompt, but don't try to use it beyond that.

score 4 · Answer 3 · answered Dec 24 '13 at 01:01

4

I sense a deeper problem here. You should not be relying on id to track unique instances over the lifetime of your program. You should simply see it as a non-guaranteed memory location indicator for the duration of each object instance. If you immediately create and release instances then you may very well create consecutive instances in the same memory location.

Perhaps what you need to do is track a class static counter that assigns each new instance with a unique id, and increments the class static counter for the next instance.

answered Dec 24 '13 at 01:01

Preet Kukreti

8,417
28
36

I don't think the OP is trying to use `id` here (or, actually, the equivalent number that appears in the `repr`) for any purpose other than debugging object lifetimes… which is the one thing it's good for. – abarnert Dec 24 '13 at 01:05
@abarnert if you see OP's comment in mhlester's answer, it seems to indicate that OP is actually looking for such an equivalent behaviour. – Preet Kukreti Dec 24 '13 at 01:07
Although from his followup comment on the same answer, it looks like he _isn't_ really looking for that, he just got confused while debugging… – abarnert Dec 24 '13 at 01:28

score 3 · Answer 4 · answered Dec 24 '13 at 00:50

3

It's releasing the first instance since it wasn't retained, then since nothing has happened to the memory in the meantime, it instantiates a second time to the same location.

answered Dec 24 '13 at 00:50

mhlester

22,781
10
52
75

Oh, I see, is there any way to tell python the memory changed so it instantiates differently? I'm not sure how I would change the memory that fast so it assigns a different id each time. – user3130555 Dec 24 '13 at 00:58
I wouldn't use the id as your identifier. Either pass in and store a counter variable, or if you want to use id, add the instance to a list or other object in order to keep it from being reused. – mhlester Dec 24 '13 at 01:08
2

I don't know why you need to have different ids, but, whatever is your reason, it is probably wrong. Also you have to take into account that due to internal "caching" it could happen (with immutable types) for two different and apparently unrelated variables to share the same object (and id). – smeso Dec 24 '13 at 01:10
@user3130555: Why is this a problem for you in the first place? If the first variable is still around, the `id`s are guaranteed not to conflict. And if it's _not_ around, then there's nothing to conflict _with_. – abarnert Dec 24 '13 at 01:10
@Faust: Good point. For a trivial example, `int(1)` will probably just return the same object no matter how many times you call it, in almost any reasonable Python implementation… – abarnert Dec 24 '13 at 01:12
@abarnert It's not a big problem, but when I was printing data to make sure everything was doing the right thing I noticed that the things had the same id, which lead me to believe that they are the same thing when they should have been different, but that isn't the case. – user3130555 Dec 24 '13 at 01:16

score 3 · Answer 5 · edited Mar 26 '17 at 17:41

3

Try this, try calling the following:

a = someClass()
for i in range(0,44):
    print(someClass())
print(a)

You'll see something different. Why? Cause the memory that was released by the first object in the "foo" loop was reused. On the other hand a is not reused since it's retained.

edited Mar 26 '17 at 17:41

Willem Van Onsem

443,496
30
428
555

answered Dec 24 '13 at 00:51

wheaties

35,646
15
94
131

score 0 · Answer 6 · answered Dec 24 '13 at 20:46

0

A example where the memory location (and id) is not released is:

print([someClass() for i in range(10)])

Now the ids are all unique.

answered Dec 24 '13 at 20:46

hpaulj

221,503
14
230
353

Why is the id of a Python class not unique when called quickly?

6 Answers6

Linked

Related