0

Consider the following piece of code:

import numpy as np
a = np.zeros(10)
b = a
b = b + 1

If I print a and b, I get

>>> a
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

and

>>> b
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Why is this? According to this answer, the third line above binds the variable a to the new name b, so that both refer to the same data. So why doesn't b = b + 1 modify a also?

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
TheProofIsTrivium
  • 768
  • 2
  • 11
  • 25
  • 3
    `b+1` is new array. You are binding `b` to that array. You aren't changing the original `a `. `b=...` is not the same as `b[:]=...` – hpaulj Dec 27 '22 at 20:39

1 Answers1

0

the interpreter sees the code this way.

import numpy as np
a = np.zeros(10)
b1 = a
b2 = b1 + 1  # make a new array b1 + 1 and save it in b2
print(b2)

the + operator in numpy (or out of numpy) allocates new memory on the heap, this new memory is then assigned to the name b. (this is also a convention outside of numpy)

to prevent this, you should use the np.add function directly and pass the out parameter. (most numpy functions have an out parameter for this purpose)

import numpy as np
a = np.zeros(10)
b = a
np.add(b,1,out=b)  # no new memory is allocated here.
print(a)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

alternatively this will do almost the same end result but is less efficient.

import numpy as np
a = np.zeros(10)
b = a
b[:] = b + 1
print(a)

which would create a new array that will contain b + 1 then copy its elements into the preexisting array of b.

using the out parameter is useful when working with large data, where a simple "reserve a temporary array for results" may cause a memory error, especially when working with gpus using cupy (which has the same numpy interface) where memory is very restricted.

Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23
  • 1
    "the + operator in numpy (or out of numpy) allocates new memory on the heap," No. That is not true, except as a *convention*. The behavior of the `+` operator *is defined by the type*. Whether a new object is created, or an existing object is modified, is up to the implementer of the type. – juanpa.arrivillaga Dec 27 '22 at 22:08
  • And talking about "the heap" is almost always the wrong level of abstraction in Python, **everything** happens "on the heap". – juanpa.arrivillaga Dec 27 '22 at 22:08
  • @juanpa.arrivillaga i am just separating between python's internal heap that's used for python objects and the OS's raw heap that's used by numpy calls to malloc, they are the same heap but, i'd rather point that it's not managed as a part of python's virtual machine. – Ahmed AEK Dec 27 '22 at 22:14
  • 1
    `numpy` could very well use memory that was previiusly freed. The way I put it `b+1` makes a new array. Where that array is 'located' isn't important. It's an array object with its own data-buffer (it isn't a `view` of the `a` array). The key here is that it isn't modifying `a/b`. – hpaulj Dec 28 '22 at 01:30
  • `numpy` could very well use memory that was previiusly freed. The way I put it `b+1` makes a new array. Where that array is 'located' isn't important. It's an array object with its own data-buffer (it isn't a `view` of the `a` array). The key here is that it isn't modifying `a/b`. – hpaulj Dec 28 '22 at 01:30