I've faced this problem before and understand that it gets confusing. There are two concepts here:
- some data structures are mutable, while others are not
- Python works off pointers... most of the time
So let's consider the case of a list (you accidentally stumbled on interning and peephole optimizations when you used ints - I'll get to that later)
So let's create two identical lists (remember lists are mutable)
In [42]: a = [1,2]
In [43]: b = [1,2]
In [44]: id(a) == id(b)
Out[44]: False
In [45]: a is b
Out[45]: False
See, despite the fact that the lists are identical, a
and b
are different memory locations. Now, this is because python computes [1,2]
, assigns it to a memory location, and then calls that location a
(or b
). It would take quite a long time for python to check every allocated memory location to see if [1,2]
already exists, to assign b
to the same memory location as a
.
And that's not to mention that lists are mutable, i.e. you can do the following:
In [46]: a = [1,2]
In [47]: id(a)
Out[47]: 4421968008
In [48]: a.append(3)
In [49]: a
Out[49]: [1, 2, 3]
In [50]: id(a)
Out[50]: 4421968008
See that? The value that a
holds has changed, but the memory location has not. Now, what if a bunch of other variable names were assigned to the same memory location?! they would be changed as well, which would be a flaw with the language. In order to fix this, python would have to copy over the entire list into a new memory location, just because I wanted to change the value of a
This is true even of empty lists:
In [51]: a = []
In [52]: b = []
In [53]: a is b
Out[53]: False
In [54]: id(a) == id(b)
Out[54]: False
Now, let's talk about that stuff I said about pointers:
Let's say you want two variables to actually talk about the same memory location. Then, you could assign your second variable to your first:
In [55]: a = [1,2,3,4]
In [56]: b = a
In [57]: id(a) == id(b)
Out[57]: True
In [58]: a is b
Out[58]: True
In [59]: a[0]
Out[59]: 1
In [60]: b[0]
Out[60]: 1
In [61]: a
Out[61]: [1, 2, 3, 4]
In [62]: b
Out[62]: [1, 2, 3, 4]
In [63]: a.append(5)
In [64]: a
Out[64]: [1, 2, 3, 4, 5]
In [65]: b
Out[65]: [1, 2, 3, 4, 5]
In [66]: a is b
Out[66]: True
In [67]: id(a) == id(b)
Out[67]: True
In [68]: b.append(6)
In [69]: a
Out[69]: [1, 2, 3, 4, 5, 6]
In [70]: b
Out[70]: [1, 2, 3, 4, 5, 6]
In [71]: a is b
Out[71]: True
In [72]: id(a) == id(b)
Out[72]: True
Look what happened there! a
and b
are both assigned to the same memory location. Therefore, any changes you make to one, will be reflected on the other.
Lastly, let's talk briefly about that peephole stuff I mentioned before. Python tries to save space. So, it loads a few small things into memory when it starts up (small integers, for example). As a result, when you assign a variable to a small integer (like 5
), python doesn't have to compute 5
before assigning the value to a memory location, and assigning a variable name to it (unlike it did in the case of your lists). Since it already knows what 5
is, and has it stashed away in some memory location, all it does is assign that memory location a variable name. However, for much larger integers, this is no longer the case:
In [73]: a = 5
In [74]: b = 5
In [75]: id(a) == id(b)
Out[75]: True
In [76]: a is b
Out[76]: True
In [77]: a = 1000000000
In [78]: b = 1000000000
In [79]: id(a) == id(b)
Out[79]: False
In [80]: a is b
Out[80]: False