4

I am a little confused on how shallow copy works, my understanding is when we do new_obj = copy.copy(mutable_obj) a new object is created with elements of it still pointing to the old object.

Example of where I am confused -

## assignment
i = [1, 2, 3]
j = i
id(i[0]) == id (j[0])  # True
i[0] = 10
i  # [10, 2, 3]
j  # [10, 2, 3]

## shallow copy
k = copy.copy(i)
k   # [10, 2, 3]
id(i) == id(k)  # False (as these are two separate objects)
id(i[0]) == id (k[0])  # True (as the reference the same location, right?)
i[0] = 100
id(i[0]) == id (k[0])  # False (why did that value in that loc change?)
id(i[:]) == id (k[:])  # True  (why is this still true if an element just changed?)
i   # [100, 2, 3]
k   # [10, 2, 3]

In shallow copy, isn't k[0] just pointing to i[0] similar to assignment? Shouldn't k[0] change when i[0] changes?

Why I expect these to be same, because -

i = [1, 2, [3]]
k = copy(i)
i  # [1, 2, [3]]
k  # [1, 2, [3]]
i[2].append(4)
i  # [1, 2, [3, 4]]
k  # [1, 2, [3, 4]]
id(i[0]) == id (k[0])  # True
id(i[2]) == id (k[2])  # True
id(i[:]) == id (k[:])  # True
Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Ani Menon
  • 27,209
  • 16
  • 105
  • 126
  • But lists are not immutable objects. – Willem Van Onsem Oct 13 '18 at 18:44
  • 1
    `int`s are immutable objects, so you set a new object at `i[0]`. – Willem Van Onsem Oct 13 '18 at 18:44
  • 1
    Why do you expect `i[0]` to still point to `k[0]`? The two lists are different ones, hence an update in one list, does not reflect in the other – Willem Van Onsem Oct 13 '18 at 18:46
  • 3
    shallow copies are the same as deep copies as long as there are not nested arrays. – Pika Supports Ukraine Oct 13 '18 at 18:47
  • @WillemVanOnsem updated the question. – Ani Menon Oct 13 '18 at 18:54
  • @WillemVanOnsem `ints are immutable objects, so you set a new object at i[0]` Then what about that case in assignment(where the same is done)? – Ani Menon Oct 13 '18 at 18:55
  • 1
    @SirGoPythonJavaCppRubythe3rd oh, that makes sense. Could you explain why `id(i[:]) == id (k[:])` is True in the shallow copy example(even after k has a new element)? – Ani Menon Oct 13 '18 at 19:03
  • 1
    `id(i[:]) == id (k[:]) # True (why is this still true if an element just changed?)`. This is probably a cpython implementation detail. `id` for lists is based on memory address, which presumably is reused, because `i[:]` is immediately garbage collected. The two lists are not the same object. – Håken Lid Oct 13 '18 at 19:14
  • @AniMenon It's something about the == operator. If you print the id of `i[:]` and `j[:]`, you can see that they are different. Using the `is` operator instead of the `==` operator on `i[:]` and `j[:]`, it will return False. – Pika Supports Ukraine Oct 13 '18 at 19:16
  • The `==` operator is not doing anything strange. Try creating a tuple `id([1,2]), id([3,4])` and they are the same value. – Håken Lid Oct 13 '18 at 19:17
  • @HåkenLid interesting, `creating a tuple id(i[:]), id(k[:]) and they are the same value`.. then why: `id(i[:]), id(k[:]) => (48079624L, 48079624L)` while `id(i[0]), id(k[0]) => (45444848L, 45443240L)` ? _(i.e. when `i => [100, 2, [3, 4]]` and `k => [1, 2, [3, 4]]`)_ – Ani Menon Oct 13 '18 at 19:22
  • 1
    @AniMenon The garbage collector reclaims `i[:]` *immediately* since it is an anonymous object. Then, `k[:]` is created in the same spot. It doesn't happen if you maintain references to each individually. See my answer below. – Matt Messersmith Oct 13 '18 at 19:55
  • @MattMessersmith yes, reading it. Thanks. – Ani Menon Oct 13 '18 at 19:57

3 Answers3

3

id(i) == id(k) # False (as these are two separate objects)

Correct.

id(i[0]) == id (k[0]) # True (as the reference the same location, right?)

Correct.

i[0] = 100

id(i[0]) == id (k[0]) # False (why did that value in that loc change?)

It changed because you changed it in the previous line. i[0] was pointing 10, but you changed it to point to 100. Therefore, i[0] and k[0] now no longer point to the same spot.

Pointers (references) are one way. 10 does not know what is pointing to it. Neither does 100. They are just locations in memory. So if you change where i's first element is pointing to, k doesn't care (since k and i are not the same reference). k's first element is still pointing to what it always was pointing to.

id(i[:]) == id (k[:]) # True (why is this still true if an element just changed?)

This one's a bit more subtle, but note that:

>>> id([1,2,3,4,5]) == id([1,2,3])
True

whereas

>>> x = [1,2,3,4,5]
>>> y = [1,2,3]
>>> id(x) == id(y)
False

It has to do with some subtleties of garbage collection and id, and it's answered in depth here: Unnamed Python objects have the same id.

Long story short, when you say id([1,2,3,4,5]) == id([1,2,3]), the first thing that happens is we create [1,2,3,4,5]. Then we grab where it is in memory with the call to id. However, [1,2,3,4,5] is anonymous, and so the garbage collector immediately reclaims it. Then, we create another anonymous object, [1,2,3], and CPython happens to decide that it should go in the spot that it just cleaned up. [1,2,3] is also immediately deleted and cleaned up. If you store the references, though, GC can't get in the way, and then the references are different.

Mutables example

The same thing happens with mutable objects if you reassign them. Here's an example:

>>> import copy
>>> a = [ [1,2,3], [4,5,6], [7,8,9] ]
>>> b = copy.copy(a)
>>> a[0].append(123)
>>> b[0]
[1, 2, 3, 123]
>>> a
[[1, 2, 3, 123], [4, 5, 6], [7, 8, 9]]
>>> b
[[1, 2, 3, 123], [4, 5, 6], [7, 8, 9]]
>>> a[0] = [123]
>>> b[0]
[1, 2, 3, 123]
>>> a
[[123], [4, 5, 6], [7, 8, 9]]
>>> b
[[1, 2, 3, 123], [4, 5, 6], [7, 8, 9]]

The difference is when you say a[0].append(123), we're modifying whatever a[0] is pointing to. It happens to be the case that b[0] is pointing to the same object (a[0] and b[0] are references to the same object).

But if you point a[0] to a new object (through assignment, as in a[0] = [123]), then b[0] and a[0] no longer point to the same place.

Community
  • 1
  • 1
Matt Messersmith
  • 12,939
  • 6
  • 51
  • 52
2

In Python all things are objects. This includes integers. All lists only hold references to objects. Replacing an element of the list doesn't mean that the element itself changes.

Consider a different example:

class MyInt:
    def __init__(self, v):
        self.v = v
    def __repr__(self):
        return str(self.v)

>>> i = [MyInt(1), MyInt(2), MyInt(3)]
[1, 2, 3]
>>> j = i[:] # This achieves the same as copy.copy(i)

[1, 2, 3]
>>> j[0].v = 7
>>> j
[7, 2, 3]
>>> i
[7, 2, 3]

>>> i[0] = MyInt(1)
>>> i
[1, 2, 3]
>>> j
[7, 2, 3]

I am creating a class MyInt here which just holds an int. By modifying an instance of the class, both lists "change". However as I replace a list entry, the lists are now different.

The same happens with integers. You just can't modify them.

StuxCrystal
  • 846
  • 10
  • 20
0
  • In the first case j = i is an assignment, both j and i point to the same list object.
    When you change an element of the list object and print i and j, since both i and j point to same list object, and it is the element and not the list object which has changed, so both will print the same output.
  • In the second case k = copy.copy(i) is a shallow copy, in which a copy of list object and copy of nested references is made but the internal immutable objects are not copied.
    A shallow copy doesn't create a copy of nested objects, instead it just copies the reference of nested objects. Please refer this https://www.programiz.com/python-programming/shallow-deep-copy
  • Thus i and k have different set of references pointing to the same immutable objects. When you do i[0] = 100, the reference in list i points to a new int object with value 100, but the reference in k still references the old int object with value 10.
shirish
  • 668
  • 4
  • 9