How to properly dereference if copying a part of a numpy array?

Question

In my analysis script I noticed some weird behaviour (guess it's intended though) with the copying of arrays in Python. If I have a 2D-array A, create another array B with entries from A and then normalize B with the length of A's first dimension, the entries in A change in a strange way. I can reproduce the problem with the following code:

foo = np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
startIndex = 1
print(foo)
for it, i in enumerate(foo):
    if not it:
        sum = i[startIndex:]
    else:
        sum += i[startIndex:]
print(foo)
sum /= foo.shape[0]
print(foo)

The output is:

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
[[ 1. 15. 18.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]]
[[1. 5. 6.]
 [4. 5. 6.]
 [7. 8. 9.]]

The shape of the array doesn't matter but this 3x3 form shows it quite good. I guess that sum = i[startIndex:] somehow sets a reference to the last two entries of foo[0] and changes to sum also effect those entries - but according to this question I guessed I would get a copy instead of the reference. What is the proper way to get a copy of only a part of the array?

Your link is talking about `list` objects, which will create copies if you slice. Slicing for `numpy.ndarray` objects, however, creates *views* — juanpa.arrivillaga, Jan 15 '20 at 00:32

score 0 · Answer 1 · answered Jan 15 '20 at 00:27

0

You can make a copy by value using the np.array constructor:

for it, i in enumerate(foo):
    if not it:
        s = np.array(i[startIndex:])
    else:
        s += i[startIndex:]

Note that I've changed the name of the variable to avoid shadowing the built-in sum function.

answered Jan 15 '20 at 00:27

kaya3

47,440
4
68
97

Great, thank you for your answer! I can't upvote it currently, maybe in the future. – Goodman Jan 15 '20 at 01:34

hpaulj · Accepted Answer · 2020-01-15T17:57:13.817

First a note on array construction. An array has basic information like shape, dtype and strides, and a 1d data buffer, where the actual values are stored.

A copy would be a new array object, with its own values.

A view is a new array object, with its own shape,etc, but it shares the data buffer with the source array. As an efficiency measure, numpy tries to make a view where possible. Many operations do this, such as reshape and transpose. Indexing can also make a view. basic indexing with a scalar or slice makes a view, advanced indexing with lists, masks or arrays makes a copy.

X.copy() is one way of forcing a copy. np.array(X) also makes a copy - if using its default copy parameter.

In [113]: foo = np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])

Iteration on a 2d array, as you do, selects successive 'rows', each a view:

In [114]: for row in foo: 
     ...:     row[1] += 10     # modify the 2nd element of the row
     ...:                                                                                        
In [115]: foo                                                                                    
Out[115]: 
array([[ 1., 12.,  3.],
       [ 4., 15.,  6.],
       [ 7., 18.,  9.]])

Selecting the row by scalar index does the same thing:

In [116]: foo[0]                                                                                 
Out[116]: array([ 1., 12.,  3.])
In [117]: foo[0][1:]-=10                                                                         
In [118]: foo                                                                                    
Out[118]: 
array([[ 1.,  2., -7.],
       [ 4., 15.,  6.],
       [ 7., 18.,  9.]])
In [119]: foo[0,1:]                                                                              
Out[119]: array([ 2., -7.])

So becoming familiar as to when you get a view and when a copy is an important part of learning numpy.

So using copy to make a copy of the 1st row:

In [124]: foo = np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])                                                                                             
In [126]: sum = foo[0,1:].copy() 
     ...: for row in foo[1:]:     # iterate on a slice (skips 1st row)
     ...:     sum += row[1:]      # select a slice of row (skips 1st element)                                            
In [127]: foo                # no change                                                                    
Out[127]: 
array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])
In [128]: sum                                                                                    
Out[128]: array([15., 18.])

But we could also take this sum without iteration:

In [129]: foo[:, 1:].sum(axis=0)   # select slice of columns, sum across rows                                                                 
Out[129]: array([15., 18.])

A better way to write [126] iteration (and probably close to what sum implements in compiled code):

In [200]: res = np.zeros(2, float)                                                               
In [201]: for row in foo: 
     ...:     res += row[1:] 
     ...:                                                                                        
In [202]: res                                                                                    
Out[202]: array([15., 18.])

I assume ```In [129]``` creates a copy because it is some kind of advanced indexing, correct? — Goodman, Jan 15 '20 at 14:49
`foo[:, 1:]` is `view` (indexing with slices), but the `sum` method isn't modifying any values. It is compiled code. Internally it does use some sort of buffer to accumulate values, but that buffer is probably initialized as a new `zeros` array. I added an example of that. — hpaulj, Jan 15 '20 at 17:58

How to properly dereference if copying a part of a numpy array?

2 Answers2