I use python 2.7. Say I have an array X and a function that centers this array according to means :
import numpy as np
X = np.array([[float(i+j) for i in range(3)] for j in range(3)])
If we print X at this point
X = array([[ 0., 1., 2.],
[ 1., 2., 3.],
[ 2., 3., 4.]])
Now let's compute the mean (according to each column) and write the function that centers each X[i,j]
according to the j-th column's mean.
means = X.means(axis=0) #mean for each column
We print means = array([ 1., 2., 3.])
(seem's legit).
def center(arr, means) :
for i, mean in enumerate(means) :
arr[:, i] -= mean
# Instructions
# end of the function without return X or something
Let' call this function on X
: center(X)
, and print again X
, one will have :
X = array([[-1., -1., -1.],
[ 0., 0., 0.],
[ 1., 1., 1.]])
So, X
is modified whereas it shouldn't be since it is not returned.
My question is twofold : 1-How come X is modified outside the function whereas I don't return it 2- Is there something to do to avoid that modification that can lead to big confusion
The solution I found is to np.copy
the arr
before the for
loop, (inside the center
function), and deal with the copy to keep X
as it is.
Note that this problem doesn't arise when dealing with int or float.
PS : I suppose that the problem rely on how the arrays are stored and the way broadcasting is made in Python but I believe there is a way to get a better grip on it
Thanks for your time