Applying a function across a numpy array gives different answers in different runs

Question

I've been experimenting with vectorizing a function of mine and have hit a strange bug in my code that I haven't been able to figure out.

Here's the numpy array in question (https://filebin.net/c14dcklwakrv1hw8)

import numpy as np
example = np.load("example_array.npy") #Shape (2, 5, 5)

The problem I was trying to solve was to normalize the values in each row such that they summed to 1, except of course rows that are entirely 0's. Since numpy divide has an option to skip 0's when dividing the function I used was

f = lambda x: np.divide(x, np.sum(x, axis=1)[:, np.newaxis], where=np.sum(x, axis=1)[:, np.newaxis]!=0)

What happens however is that the value of f(example[1]) changes depending if example[1] or example[0] is run in the python terminal before it. So if you run example[0] then do f(example[1]) the last row of example[0] replaces the first row of the answer.

The execution of it can seen here https://i.stack.imgur.com/mC51R.jpg

Python version - 3.6.6, numpy - 1.15.3

Edit: - Adding 1 to all elements of the matrix and repeating the same operation without the where condition in np.divide works without issue. I guess that's the source of the error but I don't know why it occurs

The lambda is used to normalize row values such that they sum to 1. This is done by finding the sum across rows and dividing the array with sum. The where option in the lambda is just to ensure that division wherever the sum is 0 should be ignored and the original value left as is. — Prateek, Jan 10 '19 at 22:03
That wasn't what I meant. I dont understand why this has to be done with a lambda function at all. Not only that, named lambdas are discouraged by PEP8. Presumably it's mutating its own values somehow. — roganjosh, Jan 10 '19 at 22:05
Aah it was originally a lambda because the matrix was 1500 X 5 X 5 and I wanted to do use a comprehension to do it quickly. If you replace the lambda with a proper function the bug still occurs though — Prateek, Jan 10 '19 at 22:09
Are you familiar with what vectorized operations actually mean in numpy? Each function call is mutating the entire array in one go. You're essentially constantly mutating the starting point. The function is not being applied element-wise, and you're feeding the same array through the function in different orders — roganjosh, Jan 10 '19 at 22:14
The essential operation I'm doing is a matrix/(sum of that matrix). It's just that the matrix is a slice of another one. I don't see why it should have different results? Would it not be a vectorized operation? — Prateek, Jan 10 '19 at 22:21

Ben · Accepted Answer · 2019-01-11T14:19:17.020

The problem in your function lies in the where call. numpy.divide() will execute the underlying ufunc (the actual function call being vectorized in the numpy.divide call) ONLY at places where your where statement is evaluated at true.

At the other places, it puts whatever it has in memory to fill-in the array it creates. In order to have a good output, you need to use the out argument in the np.divide function (see the doc here : https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.divide.html). An example implementation using a defined function is below (your initial function is also there for reference) :

import numpy as np
e = np.load("example_array.npy")

def normalize_badversion(x):
    idx = np.sum(x, axis=1)[:, np.newaxis]
    return np.divide(x, idx, where=idx!=0)

def normalize(x):
    idx = np.sum(x, axis=1)[:, np.newaxis]
    return np.divide(x, idx, where=idx!=0, out=np.zeros_like(x))

print e[0]
a = normalize(e[1])
print e[1]
b = normalize(e[1])
print np.allclose(a,b)


print e[0]
a = normalize_badversion(e[1])
print e[1]
b = normalize_badversion(e[1])
print np.allclose(a,b)

Final note : I agree the current doc of numpy divide is not really clear on that matter. A recent fix was pushed in the numpy doc to clarify this, see https://github.com/numpy/numpy/commit/9a82c53c8a2b9bd4798e515544de4a701cbfba3f

Applying a function across a numpy array gives different answers in different runs

1 Answers1

Linked