9

I am getting baffled by how copying a Numpy array works in Python. I start with the following:

import numpy as np
p = np.array([1.0, 0.0, 1.0, 0.3])

Then I try to make "copies" of p using the following three methods:

q = p
q1 = p[:]
q2 = p.copy()

Now I execute q1[2] = 0.2, and then check the values of q, q1, and q2. I was surprised to find that p, q, and q1 all changed to array([1.0, 0.0, 0.2, 0.3]), while only q2 remains invariant. I have also used id() to check the address of all four variables (p, q, q1, q2), and have confirmed that id(p) = id(q), but id(q1) != id(p).

My question is, if id(q1) != id(p), how can a modification of q1 alters p and q? Thanks!

Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75
Xiao
  • 99
  • 3
  • 1
    2 objects with unique IDs can still access shared memory – byxor Mar 26 '20 at 13:26
  • @byxor Thanks for your quick reply! Then what does this ID refer to? I thought id(x) checks the memory location of x, no? – Xiao Mar 26 '20 at 13:27
  • `[...][:]` makes a (shallow) copy of the list because that's how `list.__getitem__` is defined. `np.array.__getitem__` is defined differently. – chepner Mar 26 '20 at 13:44
  • @Xiao Correct. Although there's nothing to stop 2 objects with separate memory addresses from modifying 1 piece of shared memory. E.g. if 2 instances of a class contain a reference to a list, the instances will have separate IDs but they will both manipulate the same underlying memory (of the list). – byxor Mar 26 '20 at 13:50
  • @byxor I see. That's very helpful. Thanks! – Xiao Mar 26 '20 at 16:09

2 Answers2

8

The documentation of Numpy states:

All arrays generated by basic slicing are always views of the original array.

Therefore q1 in your case is a view of p and reflects the changes made to p.

Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75
5

Because you are using a simple slicing operation, numpy will use a shared memory view of the resulting slice of the array. In this case it is the entire array. They are referenced by different python objects, but the underlying numpy array is the same. q1 is just a view into the same array that p is referencing.

You can check this using np.shared_memory.

import numpy as np
p = np.array([1.0, 0.0, 1.0, 0.3])

q1 = p[:]

np.shares_memory(p, q1)
# returns:
True

This is even true when the slice is not of the entire array. Such as:

p = np.array([1.0, 0.0, 1.0, 0.3])

q2 = p[1::2]
print(q2)
#prints:
[0.  0.3]

# setting a value of q2 changes p
q2[0] = 10.0
p
# returns:
array([ 1. , 10. ,  1. ,  0.3])
James
  • 32,991
  • 4
  • 47
  • 70
  • Thanks for your reply! So in order to create a truly independent copy of p, I should use p.copy() then? – Xiao Mar 26 '20 at 13:33
  • Yes. That is the best method. – James Mar 26 '20 at 13:35
  • Thanks for your reply! A follow-up question: In terms of making an independent copy of p (so that changes to q2 will not affect p), is q2 = p.copy() enough? Do I ever need q2 = p.deepcopy() to make sure q2 is independent of p? – Xiao Mar 26 '20 at 13:41
  • Copy should be enough unless it is a numpy array of nested python objects. – James Mar 27 '20 at 14:16
  • Thanks for your reply! It's very helpful for me:) – Xiao Mar 27 '20 at 16:20