27

Hello I am trying to understand how Python's pass by reference works. I have an example:

>>>a = 1
>>>b = 1
>>>id(a);id(b)
140522779858088
140522779858088

This makes perfect sense since a and b are both referencing the same value that they would have the identity. What I dont quite understand is how this example:

>>>a = 4.4
>>>b = 1.0+3.4
>>>id(a);id(b)
140522778796184
140522778796136

Is different from this example:

>>>a = 2
>>>b = 2 + 0
>>>id(a);id(b)
140522779858064
140522779858064

Is it because in the 3rd example the 0 int object is being viewed as "None" by the interpreter and is not being recognized as needing a different identity from the object which variable "a" is referencing(2)? Whereas in the 2nd example "b" is adding two different int objects and the interpreter is allocating memory for both of those objects to be added, which gives variable "a", a different identity from variable "b"?

smci
  • 32,567
  • 20
  • 113
  • 146
Pulse
  • 527
  • 4
  • 11
  • Nice question, will try to answer it but you need to understand quite Python-only concepts. – Adirio Mar 20 '17 at 15:57
  • 1
    @Adirio Just realized I was saying "memory address" instead of identity, I made the adjustment. Currently transitioning from C++. – Pulse Mar 20 '17 at 15:59
  • It has to do with how Python stores small integers, in order to save memory. You may be interested in [this article](https://davejingtian.org/2014/12/11/python-internals-integer-object-pool-pyintobject/). – a_guest Mar 20 '17 at 16:03
  • Pulse the "passing by reference" pointed me in that direction (transition from C++), but I understood your question. I tried to answer in an easy way below. – Adirio Mar 20 '17 at 16:12
  • There are already 329 posts on [Python interning](http://stackoverflow.com/search?q=%5Bpython%5D+interning+)! Please figure out the canonical duplicate. – smci Mar 20 '17 at 19:06
  • This has nothing to do with passing by reference, does it? – user253751 Mar 21 '17 at 05:49

3 Answers3

48

In your first example, the names a and b are both "referencing" the same object because of interning. The assignment statement resulted in an integer with the same id only because it has reused a preexisting object that happened to be hanging around in memory already. That's not a reliable behavior of integers:

>>> a = 257
>>> b = 257
>>> id(a), id(b)
(30610608, 30610728)

As demonstrated above, if you pick a big enough integer then it will behave as the floats in your second example behaved. And interning small integers is optional in the Python language anyway, this happens to be a CPython implementation detail: it's a performance optimization intended to avoid the overhead of creating a new object. We can speed things up by caching commonly used integer instances, at the cost of a higher memory footprint of the Python interpreter.

Don't think about "reference" and "value" when dealing with Python, the model that works for C doesn't really work well here. Instead think of "names" and "objects".

names

The diagram above illustrates your third example. 2 is an object, a and b are names. We can have different names pointing at the same object. And objects can exist without any name.

Assigning a variable only attaches a nametag. And deleting a variable only removes a nametag. If you keep this idea in mind, then the Python object model will never surprise you again.

wim
  • 338,267
  • 99
  • 616
  • 750
  • When dealing with Python you should think about "references" and "objects" (in memory); here `a` and `b` happen to refer to the same object (for the reasons you mentioned). – a_guest Mar 20 '17 at 16:06
  • 13
    I don't like to use the word "reference" because it brings along unnecessary C baggage (details about about memory locations, which we don't need to care about in Python since we work at a higher level of abstraction). It's a *name* because it lives in some *namespace*. – wim Mar 20 '17 at 16:14
  • 1
    I don't want to be pedantic, but it's the literal strings `"a"` and `"b"` that are the names, while `a` and `b` (the variables) are references to objects in memory. Also when programming in Python you _should_ be aware of references, I've seen quite a few bugs which happened due to an object's reference count dropping to zero (and the object being garbage collected). – a_guest Mar 20 '17 at 16:31
  • That makes perfect sense now. Thank you clearing this up for me. I was not aware of the CPython interning implementation. – Pulse Mar 20 '17 at 17:11
  • 1
    @a_guest How did you encounter that? In Python, reference counting is behind the scenes, so you shouldn't encounter such a situation with the object still being accessible unless you're using classes from the `weakref` module or buggy C extensions (or are writing your own extension). – JAB Mar 20 '17 at 21:04
  • As an anecdote: I've seen a bug caused by ref count unexpectedly reaching zero exactly one time (in 10 years of my Python career so far). It was, as @JAB already brought up, due to some library code's bad use of a `weakref`. – wim Mar 20 '17 at 21:23
  • 2
    @JAB To be fair, I've only seen it when working with `PyQt`. If you don't keep references of your widgets all kind of funny things can happen. Most of the time they just vanish but sometimes the situation is a bit more subtle, see for example [this question](http://stackoverflow.com/questions/42650119/matplotlib-event-listeners-not-funcitoning-in-pyqt-widget/). – a_guest Mar 20 '17 at 22:34
  • 1
    @wim I still don't agree. You can have unnamed objects that are yet referred to. Consider `objs = [1., 2., 3.]` which creates four objects in memory three of which are unnamed (`1.`, `2.`, `3.`). "We can have different names pointing at the same object." - Well, not precisely, cause names don't point to anything (that's what references do) they just literally "name" something (references in fact; by the way "point to" is closer to the C language than "reference" (which has more of a C++ taste)). – a_guest Mar 20 '17 at 22:59
  • "And objects can exist without any name." - They can, consider my example above, but I guess you meant "without any reference" here; and then it would be worth noting that this is only true for those cached integers in CPython, cause other objects will be eventually garbage collected. – a_guest Mar 20 '17 at 23:00
  • "Assigning a variable only attaches a nametag." - It actually "assigns" a reference to the object and then binds the name to that reference. Consider the following: `obj = SomeClass(); l = [obj]; del obj;` The object behind `obj` is assigned only one name (`"obj"`) which is deleted later on, however the object persists in memory cause `l` still holds a (unnamed) reference to it. – a_guest Mar 20 '17 at 23:00
  • 3
    I don't understand what you're disagreeing with. I already wrote that "*objects can exist without any name*" in the answer, for example in `L = [1, [], 'hello']` where the "inner" list has a reference (via `L[1]`) but no name in scope. Whatever terminology you use needs to account for this kind of example, so it's better to choose different words for explaining the concepts of names and references independently. – wim Mar 20 '17 at 23:10
  • @a_guest "but it's the literal strings `"a"` and `"b"` that are the names" Those are representations of the values, not "names." The term "names" has come to mean exactly what wim says it does in Python: the variable names that bind to an object. +1 for this fantastically simple and yet accurate and revealing answer. – jpmc26 Mar 20 '17 at 23:32
  • @wim I have seen problems with objects getting collected too early when using GDAL's Python bindings. I suspect that it crops up when you're working with a library that's backed by something native that holds some kind of internal reference that Python doesn't know about. – jpmc26 Mar 20 '17 at 23:39
8

As stated here, CPython caches integers from -5 to 256. So all variables within this range with the same value share the same id (I wouldn't bet on that for future versions, though, but that's the current implementation)

There's no such thing for floats probably because there's an "infinity" of possible values (well not infinite but big because of floating point), so the chance of computing the same value by different means is really low compared to integers.

>>> a=4.0
>>> b=4.0
>>> a is b
False
Community
  • 1
  • 1
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 3
    Technically speaking, you should say _CPython_ caches integers from -5 to 256, since this may not be true in other implementations of the Python language. – nalzok Mar 21 '17 at 01:16
  • Also, slightly confusing wording: "all values within this range has the same id". I know what you mean, but it could sound to someone new to python like you are saying all numbers from -5 to 256 share a single id, which is incorrect. – Caleb Mar 21 '17 at 20:47
  • @Caleb True that's much clearer now, thanks for your comment. – Jean-François Fabre Mar 21 '17 at 20:48
3

Python variables are always references to objects. These objects can be divided into mutable and immutable objects.

A mutable type can be modified without its id being modified, thus every variable that points to this object will get updated. However, an immutable object can not be modified this way, so Python generates a new object with the changes and reassigns the variable to point to this new object, thus the id of the variable before the change will not match the id of the variable after the change.

Integers and floats are immutable, so after a change they will point to a different object and thus have different ids.

The problem is that CPython "caches" some common integer values so that there are not multiple objects with the same value in order to save memory, and 2 is one of those cache-ed integers, so every time a variable points to the integer 2 it will have the same id (its value will be different for different python executions).

Adirio
  • 5,040
  • 1
  • 14
  • 26