1

I know there are tons of resources out there about pointers and references (or rather: names, and bindings!) in Python, but I am struggling to understand one last point:

I get that if a = 1 and if b = 1 than they both are 'bound' to the same exact 1 and will have the same id() (and therefore, I think memory address). I also get that if you set:

a = [1, 2, 4]
b = a
b[0] = 45
# a is now [45, 2, 4]

because a and b are bound to the same list (object), and changes in one result in changes to another. Similarly, a[0] and b[0] are the same object. The list contains other objects with different ids - aka list identity is not bound to its contents.

Okay. So far so good. I can accept that there are 'unborn' lists and numbers floating around waiting to initialized (only once though!), and that Python takes care of assigning a memory space for them once we want them. Why then, if I do:

d = [1, 2]
e = [145, 7]
# id(d) and id(e) are not the same?!

Shouldn't there only be a single 2-element list in Python's existence? This would be consistent to me (and then there is only a single 1, a single 2, a single 145...etc).

Any explanation would appreciated - and that includes ones that relate it back to pointers (since I am also somewhat mystified about the decisions that are made at a memory management level, but I suppose that's the concern of Python's execution model and not me!)

HFBrowning
  • 2,196
  • 3
  • 23
  • 42

2 Answers2

3

You are being misled by an optimization present in CPython for ints, namely, int-caching. See this famous question. This is documented here:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object.

In almost every other instance, using a literal creates a new object. Indeed, use ints outside that range, and you'll see the normal behavior:

>>> a = 100000
>>> b = 100000
>>> id(a)
4322630608
>>> id(b)
4322630640
>>> c = a
>>> id(a) == id(b)
False
>>> id(a) == id(c)
True

And I need to repeat this almost every single day, but assignment in Python never copies.

Community
  • 1
  • 1
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • But if I read the question correctly, the OP is not asking about `int`s, but about lists, which makes the question more, well, *weird*. – Willem Van Onsem Feb 08 '17 at 00:38
  • Amazing - thank you so much. I would have never found that question on my own. I was trying hard to nod my head and be like "this is reasonable, this makes sense, there's only one of everything!" Thank goodness that's not how Python actually works – HFBrowning Feb 08 '17 at 00:39
  • @WillemVanOnsem True, but his intuitions are being colored by a weird exception to the general rule regarding literals. – juanpa.arrivillaga Feb 08 '17 at 00:40
  • 1
    @HFBrowning please please please read this: http://nedbatchelder.com/text/names.html and make sure you understand it. – juanpa.arrivillaga Feb 08 '17 at 00:41
  • Thanks for the link @juanpa.arrivillaga, I will. This model is super different than R (where everything is copied) so I appreciate any new-to-me reference – HFBrowning Feb 08 '17 at 00:43
  • @HFBrowning well, for what it's worth, I'd say Python is a *much* more elegant language than R. R is great for learning statistics in a classroom, but my experience is that it is a terrible language to get anything substantive done. And may the gods of OOP help you if you want to understand the three different class/object systems that coexist at the same time! That being said, it can be very handy for quick-and-dirty data analysis tasks, data exploration, etc. – juanpa.arrivillaga Feb 08 '17 at 00:47
1

= is an assignment.

[1, 2, 3] and 10 are objects.

if you write 10 or [1, 2, 3] python creates an objects. if you don't use an assignment, garbage collector will remove it. but if you do, python will assign a pointer to the newly created object to a given name/variable, ie:

a = [1, 2]

next, when you assign a variable to another variable, python will copy a pointer from the first variable, ie:

b = a

b now contains pointer to the same object as a. but any newly created object, even if the content is the same, is a different object. so:

id(a) != id([1, 2])

now, depending on implementation (so it can change at any time and you should not depend on it) there might be a "shortcut" for speed efficiency, and objects representing some common values might be created by default. and that's why on some implementations id(1) == id(1), but what's confusing id(5555) != id(5555).

rsm
  • 2,530
  • 4
  • 26
  • 33