Numpy vectorized 2d array operation error

Question

I'm trying to apply a vectorized function over a 2-d array in numpy row-wise, and I'm encountering ValueError: setting an array element with a sequence.

import numpy as np

X = np.array([[0, 1], [2, 2], [3, 0]], dtype=float)
coeffs = np.array([1, 1], dtype=float)

np.apply_along_axis(
    np.vectorize(lambda row: 1.0 / (1.0 + np.exp(-coeffs.dot(row)))),
    0, X
)

I don't totally know how to interpret this error. How am I setting an array element with a sequence?

When I test the lambda function on a single row, it works and returns a single float. Somehow it's failing within the scope of this vectorized function, which leads me to believe that either the vectorized function is wrong or I'm not using apply_along_axis correctly.

Is it possible to use a vectorized function in this context? If so, how? Can a vectorized function take an array or am I misunderstanding the documentation?

Why are you calling `np.vectorize` on a function that's supposed to take rows? — user2357112, Jul 12 '17 at 20:51
I ended up using the solution suggested by Divakar, but I'm interested in understanding if it's possible to use vectorize because I thought the row-wise implementation was slightly easier to interpret. — Petergavinkin, Jul 12 '17 at 21:00
As ironic as this might sound, `np.vectorize` isn't a vectorized operation, as you seem to be asking about. From the [`docs`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html) - `"The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."`. — Divakar, Jul 12 '17 at 21:03
Yeah, you're right. I think vectorize is just an all around bad idea. Thanks for the help! — Petergavinkin, Jul 12 '17 at 21:12

Divakar · Accepted Answer · 2017-07-12T21:08:01.040

You are sum-reducing the second axis of X against the only axis of coeffs. So, you could simply use np.dot(X,coeffs) for sum-reductions.

Thus, a vectorized solution would be -

1.0 / (1.0 + np.exp(-X.dot(coeffs)))

Sample run -

In [227]: X = np.array([[0, 1], [2, 2], [3, 0]], dtype=float)
     ...: coeffs = np.array([1, 1], dtype=float)
     ...: 

# Using list comprehension    
In [228]: [1.0 / (1.0 + np.exp(-coeffs.dot(x))) for x in X]
Out[228]: [0.7310585786300049, 0.98201379003790845, 0.95257412682243336]

# Using proposed method
In [229]: 1.0 / (1.0 + np.exp(-X.dot(coeffs)))
Out[229]: array([ 0.73105858,  0.98201379,  0.95257413])

The correct way to use np.apply_along_axis would be to drop np.vectorize and apply it along the second axis of X, i.e. every row of X -

np.apply_along_axis(lambda row: 1.0 / (1.0 + np.exp(-coeffs.dot(row))), 1,X)

This is right, and I actually ended up doing something very similar rather than messing with vectorize. I guess I'm more interested in why the earlier code didn't work. I'm going to edit my question — Petergavinkin, Jul 12 '17 at 20:58

hpaulj · Answer 2 · 2017-07-12T23:56:36.583

In v 1.12 vectorize docs says:

By default, pyfunc is assumed to take scalars as input and output.

In your attempt:

np.apply_along_axis(
    np.vectorize(lambda row: 1.0 / (1.0 + np.exp(-coeffs.dot(row)))),
    0, X
)

apply_along_axis iterates on all axes except 0, and feeds the resulting 1d array to its function. So for 2d it will iterate on 1 axis, and feed the other. Divakar shows it iterating on the 0 axis, and feeding rows. So it's basically the same as the list comprehension with an array wrapper.

apply_along_axis makes more sense with 3d or higher inputs, where it's more fiddly to iterate on 2 axes and feed the third to your function.

Writing your lambda as a function:

def foo(row):
    return 1.0/(1.0+np.exp(-coeffs.dot(row)))

Given an array (row) it returns a scalar:

In [768]: foo(X[0,:])
Out[768]: 0.7310585786300049

But given a scalar, it returns an array:

In [769]: foo(X[0,0])
Out[769]: array([ 0.5,  0.5])

That explains the sequence error message. vectorize expected your function to return a scalar, but it got an array.

signature

In v 1.12 vectorize adds a signature parameter, which lets us feed something bigger than a scalar to the function. I explored it in:

https://stackoverflow.com/a/44752552/901925

Using the signature I get vectorize to work with:

In [784]: f = np.vectorize(foo, signature='(n)->()')
In [785]: f(X)
Out[785]: array([ 0.73105858,  0.98201379,  0.95257413])

the same thing as this:

In [787]: np.apply_along_axis(foo,1,X)
Out[787]: array([ 0.73105858,  0.98201379,  0.95257413])

timings

In [788]: timeit np.apply_along_axis(foo,1,X)
10000 loops, best of 3: 80.8 µs per loop
In [789]: timeit f(X)
1000 loops, best of 3: 181 µs per loop
In [790]: np.array([foo(x) for x in X])
Out[790]: array([ 0.73105858,  0.98201379,  0.95257413])
In [791]: timeit np.array([foo(x) for x in X])
10000 loops, best of 3: 22.1 µs per loop

list comprehension is fastest, vectorize slowest.

Numpy vectorized 2d array operation error

2 Answers2

signature

timings

Linked