14

I am slowly trying to understand the difference between views and copys in numpy, as well as mutable vs. immutable types.

If I access part of an array with 'advanced indexing' it is supposed to return a copy. This seems to be true:

In [1]: import numpy as np
In [2]: a = np.zeros((3,3))
In [3]: b = np.array(np.identity(3), dtype=bool)

In [4]: c = a[b]

In [5]: c[:] = 9

In [6]: a
Out[6]: 
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Since c is just a copy, it does not share data and changing it does not mutate a. However, this is what confuses me:

In [7]: a[b] = 1

In [8]: a
Out[8]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

So, it seems, even if I use advanced indexing, assignment still treats the thing on the left as a view. Clearly the a in line 2 is the same object/data as the a in line 6, since mutating c has no effect on it.

So my question: is the a in line 8 the same object/data as before (not counting the diagonal of course) or is it a copy? In other words, was a's data copied to the new a, or was its data mutated in place?

For example, is it like:

x = [1,2,3]
x += [4]

or like:

y = (1,2,3)
y += (4,)

I don't know how to check for this because in either case, a.flags.owndata is True. Please feel free to elaborate or answer a different question if I'm thinking about this in a confusing way.

askewchan
  • 45,161
  • 17
  • 118
  • 134
  • See a nice summary of views vs copies [here](http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html). – Yibo Yang May 11 '18 at 00:59

3 Answers3

11

When you do c = a[b], a.__get_item__ is called with b as its only argument, and whatever gets returned is assigned to c.

When you doa[b] = c, a.__setitem__ is called with b and c as arguments and whatever gets returned is silently discarded.

So despite having the same a[b] syntax, both expressions are doing different things. You could subclass ndarray, overload this two functions, and have them behave differently. As is by default in numpy, the former returns a copy (if b is an array) but the latter modifies a in place.

Jaime
  • 65,696
  • 17
  • 124
  • 159
  • 2
    I think it whould be worth to point out explicitly in numpy documentation the fact that even advanced indexing when used as lvalue will modify the original array. – vehsakul Apr 18 '14 at 21:13
4

Yes, it is the same object. Here's how you check:

>>> a
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
>>> a2 = a
>>> a[b] = 1
>>> a2 is a
True
>>> a2
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

Assigning to some expression in Python is not the same as just reading the value of that expression. When you do c = a[b], with a[b] on the right of the equals sign, it returns a new object. When you do a[b] = 1, with a[b] on the left of the equals sign, it modifies the original object.

In fact, an expression like a[b] = 1 cannot change what name a is bound to. The code that handles obj[index] = value only gets to know the object obj, not what name was used to refer to that object, so it can't change what that name refers to.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
1

This seems to be common misunderstanding, quoting from the official document: (https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html)

The rule of thumb here can be: in the context of lvalue indexing (i.e. the indices are placed in the left hand side value of an assignment), no view or copy of the array is created (because there is no need to). However, with regular values, the above rules for creating views does apply.

In other words, the notion of view or copy only refers to the situation of retrieving values from a numpy object.

galactica
  • 1,753
  • 2
  • 26
  • 36