0

Slicing in python is supposed to make a shallow copy. However, when I run the following:

cur = [[0] * (2) for _ in xrange(2)]
cur2 = [row[:] for row in cur]
cur2[0][0] = "foo"
print(cur)
print(cur2)

I get:

[[0, 0], [0, 0]] # cur
[['foo', 0], [0, 0]] # cur2

which makes it seem like it's a deep copy.

I have two questions: 1) What's happening here? Is this a deep or shallow copy? 2) What about this syntax makes it so much faster than copy.deepcopy? For example, is it something with the way python manages memory?

user2009020
  • 287
  • 1
  • 3
  • 10

2 Answers2

1

You are making shallow copies of the inner lists (i.e. the rows), which is effectively the same as a deep copy of the outer list if the inner lists are just lists of int objects.

You've essentially implemented a deep copy function for the special case of a list of lists of integers.

Using copy.deepcopy will be slower because that function will have to investigate and cache all id's of the objects, including the int objects. Your snippet isn't doing that, but in this particular case, it doesn't matter (note, small int objects are cached at the interpreter level, they are essentially singletons, and anyway, int objects are immutable so they don't really have to be copied at all).

Here's a link to the copy module source code if you eant to see exactly what is involved in a generic deep-copy.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Thanks. I'm not sure I follow about the part about the int objects - what does caching the IDs of the objects accomplish? And what do you mean by caching at the interpreter level (and at what level are normal objects cached)? – user2009020 Aug 05 '18 at 19:06
  • @user2009020 small `int` objects ranging from `-2` to `256` are cached by the CPython implementation (that is what I mean by the interpreter level). Read more about that [here](https://stackoverflow.com/questions/15171695/whats-with-the-integer-cache-inside-python). The `copy` module creates its own cache using a `dict` in Python. In any event you would never have to copy `int` objects anyway, because they are immutable. But again, `copy.deepcopy` doesn't know in advance anything about the `object` it is copying, so it checks for objects it's already seen, to avoid a cycle. – juanpa.arrivillaga Aug 05 '18 at 19:11
  • So am I right to say that copy.deepcopy is slower because it copies each int object, whereas the code in my question doesn't? – user2009020 Aug 05 '18 at 19:21
  • @user2009020 no, it doesn't copy each `int` object, for immutable objects, `__deepcopy_atomic` is simply `return x`. However, *it checks every object* and still caches the id. – juanpa.arrivillaga Aug 05 '18 at 19:25
1

I think you have a misunderstanding of what a shallow and a deep copy is.

A shallow copy copies the structure of a collection (essentially the memory locations), whereas a deep copy duplicates everything (as in the actual data) in memory.

What you are doing here is forming a new list of copies of each of the sub-lists from the original 2d list - thus resulting in an entirely new list with no ties to the original list - therefore you have done a deep copy.

If, instead, you were to do: cur[:], you would just be copying the outer list that contains the same references to the inner list - therefore a shallow copy.

In your case, as all the memory locations are different, changing one element does not effect the original, but if you were to just do cur[:], then the inner rows would reference the same locations in memory, so changing elements from either list would effect the other list.

Joe Iddon
  • 20,101
  • 7
  • 33
  • 54