0

Consider the following minimal example. Can somebody explain the apparently inconsistent logic of numpy when it comes to copying list elements of varying nesting depths?

import numpy as np

L = [[[[1, 1], 2, 3]]]
A1 = np.array(L)

A2 = A1.copy()

A1[0][0][2] = 'xx'
A1[0][0][0][0] = 'yy'

print "\nA1 after changes:\n{}".format(A1)
print "\nA2 only partially changed:\n{}".format(A2)

Results:

A1 after changes:
[[[['yy', 1] 2 'xx']]]

A2 only partially changed:
[[[['yy', 1] 2 3]]]

Then:

>>> print A1[0][0][2] == A2[0][0][2]
False
>>> print A1[0][0][0][0] == A2[0][0][0][0]
True

I have a hard time explaining to myself why 3 is not replaced, but 1 in a deeper level is.

  1. A2 = np.array(A, copy=True) and A2 = np.empty_like(A); np.copyto(A4, A) behave the same as the code above

  2. A2 = A[:] behaves the same as A2 = A: both are identical after changes

  3. import copy; A2 = copy.deepcopy(A) is the only solution I found to create an independent copy.

icedwater
  • 4,701
  • 3
  • 35
  • 50
wolf
  • 53
  • 5
  • Given that you've figured out that only a deep copy stops this from happening, what's confusing you? In the other cases, you're copying *references to the same mutable object*. – jonrsharpe May 31 '16 at 22:50
  • 1
    It's because you have an array with `dtype=object` ... Basically, you've got an array that holds a reference to a python list and 2 python integers. If you copy the array, it just copies the references. – mgilson May 31 '16 at 22:56
  • I don't like the duplicate. Numpy arrays have special copy issues. And `dtype` object arrays further complicate the issue. This question should be reopened. – hpaulj Jun 01 '16 at 06:20

1 Answers1

1

Look at your array, and understand its structure first:

In [139]: A1
Out[139]: array([[[[1, 1], 2, 3]]], dtype=object)

In [140]: A1.shape
Out[140]: (1, 1, 3)

It's a dtype=object array; that is the elements are object pointers, not numbers. Also it is 3d, with 3 elements.

In [142]: A1[0,0]  
Out[142]: array([[1, 1], 2, 3], dtype=object)

Since it is an array, A1[0,0] is better than A1[0][0]. Functionally the same, but clearer. A1[0,0,:] is even better. Anyways, at this level we still have an array with shape (3,), i.e. 1d with 3 elements.

In [143]: A1[0,0,0]
Out[143]: [1, 1]

In [144]: A1[0,0,2]
Out[144]: 3

Now we get a list and numbers, the individual elements of A1. The list is mutable, the number is not.

We can change the 3rd element (a number) to a string:

In [148]: A1[0,0,2]='xy'

To change an element of the 1st element, a list, I have to use the mixed indexing, not a 4 level array indexing.

In [149]: A1[0,0,0,0]
...
IndexError: too many indices for array

In [150]: A1[0,0,0][0]='yy'

In [151]: A1
Out[151]: array([[[['yy', 1], 2, 'xy']]], dtype=object)

A1 is still a 3d object array; we have just change a couple of elements. The 'xy' change is different from the 'yy' change. One changed the array, the other a list element of the array.

A2=A1.copy() makes a new array with copies of the elements (the data buffer) of A1. So A2 has pointers to the same objects as A1.

The 'xy' changed the pointer in A1, but did not change the A2 copy.

The 'yy' change modified the list pointed to by A1. A2 has a pointer to the same list, so it sees the change.

Note that L, the original nested list sees the same change:

In [152]: L
Out[152]: [[[['yy', 1], 2, 3]]]

A3 = A[:] produces a view of A1. A3 has the same data buffer as A1, so it sees all the changes.

A4 = A would also see the same changes, but A4 is a new reference to A1, not a view or a copy.

The duplicate answer that was raised earlier dealt with references, copies and deep copies of lists. That is relevant here because L is a list, and A1 is an object array, which in many ways is an array wrapper around a list. But A1 is also numpy array, which has the added distinction between view and copy.

This is not a good use of numpy arrays, not even the object dtype version. It's an instructive example, but too confusing to be practical. If you need to do a deepcopy on an array, you probably are using arrays wrong.

hpaulj
  • 221,503
  • 14
  • 230
  • 353