25

I was fiddling around with id recently and realized that (c?)Python does something quite sensible: it ensures that small ints always have the same id.

>>> a, b, c, d, e = 1, 2, 3, 4, 5
>>> f, g, h, i, j = 1, 2, 3, 4, 5
>>> [id(x) == id(y) for x, y in zip([a, b, c, d, e], [f, g, h, i, j])]
[True, True, True, True, True]

But then it occurred to me to wonder whether the same is true for the results of mathematical operations. Turns out it is:

>>> nines = [(x + y, 9) for x, y in enumerate(reversed(range(10)))]
>>> [id(x) == id(y) for x, y in nines]
[True, True, True, True, True, True, True, True, True, True]

Seems like it starts failing at n=257...

>>> a, b = 200 + 56, 256
>>> id(a) == id(b)
True
>>> a, b = 200 + 57, 257
>>> id(a) == id(b)
False

But sometimes it still works even with larger numbers:

>>> [id(2 * x + y) == id(300 + x) for x, y in enumerate(reversed(range(301)))][:10]
[True, True, True, True, True, True, True, True, True, True]

What's going on here? How does python do this?

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
jsau
  • 837
  • 7
  • 11

3 Answers3

20

You've fallen into a not uncommon trap:

id(2 * x + y) == id(300 + x)

The two expressions 2 * x + y and 300 + x don't have overlapping lifetimes. That means that Python can calculate the left hand side, take its id, and then free the integer before it calculates the right hand side. When CPython frees an integer it puts it on a list of freed integers and then re-uses it for a different integer the next time it needs one. So your ids match even when the result of the calculations are very different:

>>> x, y = 100, 40000
>>> id(2 * x + y) == id(300 + x)
True
>>> 2 * x + y, 300 + x
(40200, 400)
Duncan
  • 92,073
  • 11
  • 122
  • 156
  • So if what you say above is literally true, then there is a sense in which python ints _are_ mutable (only after being garbage collected). – jsau May 23 '11 at 19:23
17

Python keeps a pool of int objects in certain numbers. When you create one in that range, you actually get a reference to the pre-existing one. I suspect this is for optimization reasons.

For numbers outside the range of that pool, you appear to get back a new object whenever you try to make one.

$ python
Python 3.2 (r32:88445, Apr 15 2011, 11:09:05) 
[GCC 4.5.2 20110127 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 300
>>> id(x)
140570345270544
>>> id(100+200)
140570372179568
>>> id(x*2)
140570345270512
>>> id(600)
140570345270576

Source

PyObject* PyInt_FromLong(long ival) Return value: New reference. Create a new integer object with a value of ival.

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

emphasis mine

Daenyth
  • 35,856
  • 13
  • 85
  • 124
  • What happens when the numbers are larger? Sometimes the ids are still the same. Is it doing a hash lookup or something? – jsau May 23 '11 at 18:46
  • @jsau: I edited my answer to include that. – Daenyth May 23 '11 at 18:50
  • @Daenyth, Yeah, but sometimes it's _not_ a new object; as my example demonstrates, sometimes `2 * x + y` returns the same object as `300 + x`. Or am I misunderstanding what `id` does? – jsau May 23 '11 at 18:55
  • @jsau: I'm not seeing that. I can't say for certain what happens in that case, but I did post an example that supports that you get a new object back. `id()` returns a unique identifier for an object, which in cpython is the object's address in memory. – Daenyth May 23 '11 at 19:02
  • 2
    id's will get reused if the object they pointed to gets garbage collected. So you can't keep id's around as a pseudo-object key, they may end up pointing to something different later. – PaulMcG May 23 '11 at 21:20
2

AFAIK, id has nothing to do with the size of the parameter. It MUST return a life-time unique identifier, and it CAN return the same result for two different parameters, if they do not exist concurrently.

Hyperboreus
  • 31,997
  • 9
  • 47
  • 87
  • From the Doc: Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value. – Hyperboreus May 23 '11 at 18:44
  • @Daenyth: Please specify what is incorrect. Erm, why did you delete your comment instead of explaining it? – Hyperboreus May 23 '11 at 18:45
  • From Daenyth's answer, seems like it _does_ have something to do with the size of the parameter, if by size you mean the magnitude of the value. – jsau May 23 '11 at 18:49
  • @Hyperboreus: I deleted it because I was editing it and then you posted the doc that I was going to reference. Specifically the `guaranteed to be unique and constant for this object during its lifetime` part. – Daenyth May 23 '11 at 18:51
  • 2
    The fact that one (or some or all) python implementations keep an array for certain small numbers of integers doesn't effect how id() works. Who can tell if this will be like this in other or future implementations? One shouldn't rely on implementation details, but on the documented API to avoid bad surprises. The API states the uniqueness and constancy or non-overlapping objects, nothing else. That for certain values of int it produces the same output is nice to know, but only by chance (due to the current implementation you are using). Also look at the upvoted answer. – Hyperboreus May 23 '11 at 19:31
  • 1
    I'm not sure why this was so heavily downvoted. There are two behaviors in play here, first the (implementation-defined) cache for certain integer objects, and secondly the possibility of ids being reused. Hyperboreus *correctly* (AFAIU) pointed out that seeing the same result from `id()` on two different objects means nothing at all **in the case where the objects have non-overlapping lifetimes**, which seems to be the case. This is basically exactly what Duncan's answer says above, although admittedly not worded as clearly. – Daniel Pryden May 23 '11 at 19:59
  • @Daniel, @Hyperboreus, just to be clear, I didn't downvote this answer. – jsau May 23 '11 at 20:04
  • 2
    That's not the point and it is everybody's right to do so as they see fit. The important thing is not to base one's code on haphazard implementation behaviour, but on documented API. And btw this is a very interesting question with a lot of interesting answers and comments. – Hyperboreus May 23 '11 at 20:06
  • My next to last +1 goes here for (practical) purity and good sense. – mlvljr Jun 02 '11 at 20:48