NumPy vectorize() or dot() appears buggy

Question

In the code below, y1 and y2 ought to be equal but they aren't. Could there be a bug in vectorize() or dot()?

import numpy as np
interval = np.arange(0, 30, 0.1)
y1 = [- 1.57 * max(0, x - 10) - 0.72 * max(0, 15 - x)
      - 1.09 * max(0, 20 - x) for x in interval]

def fun(x, pivot, truth):
    if truth: return max(0, x - pivot)
    else:     return max(0, pivot - x)

pivots = [10, 15, 20]
truths = [ 1,  0,  0]
coeffs = [-1.57, -0.72, -1.09]
y2 = [np.dot(np.vectorize(fun)(x, pivots, truths), coeffs) for x in interval]

import matplotlib.pyplot as plt
plt.plot(interval, y1, interval, y2)
plt.show()

Graphs of y1 and y2:

hpaulj · Answer 1 · 2016-12-18T06:36:39.647

I'm not sure this applies in your case, but vectorize has a few tricks.

If you do not specify the return dtype, it determines it with an test calculation - with your first case. If your function returns a scalar integer, like 0, then vectorize returns an integer array. So if you expect a floats, make sure you specify the return dtype.

Also - vectorize is not a speed tool. It's just a convenient way of applying broadcasting to your inputs. It isn't much faster than explicitly looping on your inputs.

np.vectorize(fun, otypes=[float])

removes the steps.

===========

Try this:

vfun = np.vectorize(fun, otypes=[float])
X = vfun(interval[:,None], pivots, truths)
print(X.shape)     # (300,3)
y2 = np.dot(X, coeffs)
print(y2.shape)    # (300,)

It makes fuller use of the vectorize's broadcasting.

I suspect your fun can be written so as to act on the whole x, without the iteration that vectorize does.

Changing fun to use the np.maximum, allows me to supply an array x:

def fun(x, pivot, truth):
    if truth: return np.maximum(0, x - pivot)
    else:     return np.maximum(0, pivot - x)

And I can then calculate X with only a loop over the 3 cases of pivots and truths, calculating all interval values at once:

X = np.stack([fun(interval, p, t) for p, t in zip(pivots, truths)], axis=-1)
y2 = np.dot(X, coeffs)

another way of applying the 3 'cases'

Xlist = [fun(interval, p, t)*c for p, t, c in zip(pivots, truths, coeffs)]
y2 = np.sum(Xlist, axis=0)

Since the np.dot(..., coeffs) is just a weighted sum. I'm not sure it's better.

score 4 · Accepted Answer · answered Dec 18 '16 at 05:40

In order to apply proper casting rules numpy uses your function occasionally with sentinel values (numpy.int64) to check what kind of data it outputs, if it outputs the integer 0 because thats what max returned then it assumes the result of the calculation should all be integers and rounds the other results off, however if you change the function to always return floats by using 0.0 in max:

def fun(x, pivot, truth):
    if truth: return max(0.0, x - pivot)
    else:     return max(0.0, pivot - x)

Then the checks that numpy applies will always result in float results and no rounding will be applied.

NumPy vectorize() or dot() appears buggy

2 Answers2

Linked