26

How best to write a function that can accept either scalar floats or numpy vectors (1-d array), and return a scalar, 1-d array, or 2-d array, depending on the input?

The function is expensive and is called often, and I don't want to place a burden on the caller to do special casts to the arguments or return values. It only needs to treat numbers (not lists or other iterable things).

np.vectorize might be slow (Broadcasting a python function on to numpy arrays) and other answers (Getting a Python function to cleanly return a scalar or list, depending on number of arguments) and np.asarray (A python function that accepts as an argument either a scalar or a numpy array) does not help with getting dimensions required for the output array.

This type of code would work in Matlab, Javascript, and other languages:

import numpy as np

def func( xa, ya ):
    # naively, I thought I could do:
    xy = np.zeros( ( len(xa), len(ya) ) )
    for j in range(len( ya )):
        for i in range(len( xa )):
            # do something complicated
            xy[i,j] = x[i]+y[j]            
    return xy

Works fine for arrays:

x = np.array([1., 2.])
y = np.array([2., 4.])
xy = func(x,y)
print xy

[[ 3.  5.]
 [ 4.  6.]]

But does not work for scalar floats:

x = 1.
y = 3.
xy = func(x,y)
print xy

<ipython-input-64-0f85ad330609> in func(xa, ya)
      4 def func( xa, ya ):
      5     # naively, I thought I could do:
----> 6     xy = np.zeros( ( len(xa), len(ya) ) )
      7     for j in range(len( ya )):
      8         for i in range(len( xa )):

TypeError: object of type 'float' has no len()

Using np.asarray in a similar function gives:

<ipython-input-63-9ae8e50196e1> in func(x, y)
      5     xa = np.asarray( x );
      6     ya = np.asarray( y );
----> 7     xy = np.zeros( ( len(xa), len(ya) ) )
      8     for j in range(len( ya )):
      9         for i in range(len( xa )):

TypeError: len() of unsized object

What is the fast, elegant, and pythonic approach?

Community
  • 1
  • 1
  • MATLAB does not have scalars. `size(1)` returns `1,1`. – hpaulj Mar 28 '15 at 17:25
  • But length(1) returns 1. Here is Matlab code that does what I want: `function xy=funk(x,y) nx = length(x); ny = length(y); xy = zeros(nx,ny); for j=1:ny for i=1:nx xy(i,j)=x(i)+y(j); end end return` – Christopher Sherwood Mar 29 '15 at 15:19
  • `length` has an odd definition. See my addition from the Octave help. What does your matlab code do with `[1,2;3,4]`? – hpaulj Mar 29 '15 at 16:41
  • Many numpy functions already take care of scalars and vectors. Mostly it matters, when you do yourself indexing stuff, because 0-dim arrays cannot be indexed. Check also [`np.atleast_1d`](https://numpy.org/doc/stable/reference/generated/numpy.atleast_1d.html) and [`np.take`](https://numpy.org/doc/stable/reference/generated/numpy.take.html) (this can index 0 dim arrays). – Friedrich -- Слава Україні Jun 28 '22 at 07:32

9 Answers9

25

All over the numpy code base you find things like:

def func_for_scalars_or_vectors(x):
    x = np.asarray(x)
    scalar_input = False
    if x.ndim == 0:
        x = x[None]  # Makes x 1D
        scalar_input = True

    # The magic happens here

    if scalar_input:
        return np.squeeze(ret)
    return ret
michael
  • 9,161
  • 2
  • 52
  • 49
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • 1
    I would replace `None` by the equivalent `np.newaxis`. It is in my opinion clearer what happens. – DerWeh Sep 14 '18 at 07:29
  • Is `if scalar_input:` needed? I thought I understood that `squeeze` removes an axes of length one, and if it doesn't exist, it isn't removed? Or would this be an issue when the input array already has an axis of length one? – AstroFloyd Dec 22 '21 at 13:41
  • Perhaps another stupid question, but I see that `x = np.asarray(x)` turns a scalar (e.g. `float`) into `() ` while `x = x[None]` turns it into `(1,) `. After the first step, I already have a NumPy array, so why do I need the second step? The answer is at least in part that in the magic section, `y = x` will turn y into a ``, so no longer an array. But `y = np.asarray(x)` would fix that. Is that a viable solution, or am I committing sacrilege here, and if so, why? – AstroFloyd Dec 22 '21 at 14:25
5

As a matter of opinion, I would prefer to have a function be flexible on input types, but always return a consistent type; this is what will ultimately prevent callers from having to check return types (the stated goal).

For example, allow scalars and/or arrays as arguments, but always return the array.

def func(x, y):
    # allow (x=1,y=2) OR (x=[1,2], y=[3,4]) OR (!) (x=1,y=[2,3])
    xn = np.asarray([x]) if np.isscalar(x) else np.asarray(x)
    yn = np.asarray([y]) if np.isscalar(y) else np.asarray(y)

    # calculations with numpy arrays xn and xy
    res = xn + yn  # ..etc...
    return res

(Still, the example can easily be modified to return a scalar, by setting a flag "scalar=True", yada yada yada.. but you'd also have to handle one arg's a scalar, the other is an array, etc.; seems like a lot of YAGNI to me.)

michael
  • 9,161
  • 2
  • 52
  • 49
2

" function that can accept either scalar floats or numpy vectors (1-d array), and return a scalar, 1-d array, or 2-d array"

So

scalar => scalar

1d => 2d

what produces a 1-d array?

def func( xa, ya ):
    def something_complicated(x, y):
        return x + y
    try:
        xy = np.zeros( ( len(xa), len(ya) ) )
        for j in range(len( ya )):
            for i in range(len( xa )):
                xy[i,j] = something_complicated(xa[i], ya[i])
    except TypeError:
        xy = something_complicated(xa, ya)  
    return xy

Is this ' fast, elegant, and pythonic'?

It certainly is 'pythonic'. 'try/except' is very Pythonic. So is defining a function within another function.

Fast? Only time tests will tell. It may depend on the relative frequency of scalar v. array examples.

Elegant? That is in the eyes of the beholder.

Is this more elegant? It's limited recursion

def func( xa, ya ):
    try:
        shape = len(xa), len(ya)
    except TypeError:
        # do something complicated
        return xa+ya    
    xy = np.zeros(shape)
    for j in range(len( ya )):
        for i in range(len( xa )):
            xy[i,j] = func(xa[i], ya[i])           
    return xy

If you need to correctly handle 2d+ inputs, then vectorize is clearly the least effort solution:

def something_complicated(x,y):
    return x+y
vsomething=np.vectorize(something_complicated)

In [409]: vsomething([1,2],[4,4])
Out[409]: array([5, 6])
In [410]: vsomething(1,3)
Out[410]: array(4)   # not quite a scalar

If array(4) is not the scalar output that you want, then you'll have to add a test and extract the value with [()]. vectorize also handles a mix of scalar and array (scalar + 1d => 1d).

MATLAB does not have scalars. size(3) returns 1,1.

In Javascript, [1,2,3] has a .length attribute, but 3 does not.

from a nodejs session:

> x.length
undefined
> x=[1,2,3]
[ 1, 2, 3 ]
> x.length
3

Regarding MATAB code, Octave has this to say about the length function

-- Built-in Function: length (A) Return the length of the object A.

The length is 0 for empty objects, 1 for scalars, and the number of elements for vectors. For matrix objects, the length is the number of rows or columns, whichever is greater (this odd definition is used for compatibility with MATLAB).

MATLAB does not have true scalars. Everything is at least 2d. A 'vector' just has a '1' dimension. length is a poor choice for iteration control in MATLAB. I've always used size.

To add to the MATLAB convenience, but also potential confusion, x(i) works with both row 'vectors', and column 'vectors', [1,2,3] and [1;2;3]. x(i,j) also works with both, but with different index ranges.

len works fine when iterating for Python lists, but isn't the best choice when working with numpy arrays. x.size is better if you want to total number of items. x.shape[0] is better if you want the 1st dimension.

Part of why there isn't an elegant Pythonic solution to your problem is that you are starting with some that is idiomatic MATLAB, and expected Python to behave with all the same nuances.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

I've often used something like:

def f(x):
    retscalar = np.isscalar(x)
    x = np.atleast_1d(x)
    ...(calculate y using x)
    if retscalar:
        y = y[0]
    return y

My primary motivation is generally to be able to do things like:

low = (x < a)
med = ((x >= a) & (x < b))
hi = (x >= b)
y[low] = ...(some function of x[low])

etc. Otherwise you can't use the same code for arrays and scalars.

Jon Slavin
  • 11
  • 3
0

Maybe this is not the most pythonic (and not the fastest either), but it is the most numponic way:

import numpy as np

def func(xa, ya):
    xa, ya = map(np.atleast_1d, (xa, ya))
    # Naively, I thought I could do:
    xy = np.zeros((len(xa), len(ya)))
    for j in range(len(ya)):
        for i in range(len(xa)):
            # Do something complicated.
            xy[i,j] = xa[i] + ya[j]
    return xy.squeeze()

If you are look for speed check numba out.

ocefpaf
  • 569
  • 4
  • 15
0

Write your function to not care about the dimension in the first place:

def func(xa, ya):
    # use x.shape, not len(x)
    xy = np.zeros(xa.shape + ya.shape)

    # use ndindex, not range
    for jj in np.ndindex(ya.shape):
        for ii in np.ndindex(xa.shape):
            # do something complicated
            xy[ii + jj] = x[ii] + y[jj]            
    return xy
Eric
  • 95,302
  • 53
  • 242
  • 374
0

This seems like something a function decorator could do if you want to apply this behavior to a lot of functions you write, like I do. I wrote one. Turned out to be messier than I hoped, but here it is.

Of course, we should probably all be writing code that explicitly takes and returns only arrays or scalars. Explicit is better than implicit. Use with caution.

import inspect
import functools
import numpy as np

def scalar_or_array(*names):
    """
    Decorator to make a function take either scalar or array input and return
    either scalar or array accordingly.

    The decorator accepts as arguments the names of all the parameters that
    should  be turned into arrays if the user provides scalars. Names should
    be strings.

    The function must be able to handle array input for all of the named
    arguments.

    In  operation, if all the  named arguments are scalars, then the
    decorator will apply np.squeeze() to everything the function returns.


    Example:
    @mnp.scalar_or_array('x', 'y')
    def test(x, y):
    x[x > 10] = 0
        return x, y

    test(5, 0)
    # returns: (array(5), array(0))

    test(20, 0)
    # returns: (array(0), array(0))

    test(np.arange(20), 0)
    # returns: (array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,  0,  0,  0,
                        0,  0,  0,  0,  0,  0]), array([0]))
    # notice that the second value returned gets turned into an array in
    # this case
    """
    def decorator(func):

        # get the decorated functions call signature
        signature = inspect.signature(func)

        # now the modified version of the decorated function
        @functools.wraps(func)
        def mod(*args, **kwargs):
            # map this modified function's arguments to that of the decorated
            # function through the "bind" method of the signature
            boundargs = signature.bind(*args, **kwargs)

            # now check if each of the listed arguments is a scalar. if so,
            # make it an array with ndim=1 in the bound arguments.
            scalar_input = []
            for name in names:
                if name in signature.parameters:
                    val = boundargs.arguments[name]
                    if np.isscalar(val):
                        scalar_input.append(True)
                        ary = np.reshape(val, 1)
                        boundargs.arguments[name] = ary
                    else:
                        scalar_input.append(False)

            # now apply the function
            result = func(**boundargs.arguments)

            # if all the user-named inputs were scalars, then try to return
            # all scalars, else, just return what the functon spit  out
            if all(scalar_input):
                if type(result) is tuple:
                    return tuple(map(np.squeeze, result))
                else:
                    return np.squeeze(result)
            else:
                return result

        return mod
    return decorator
parkus
  • 123
  • 7
0

Try this:

def func( xa, ya ):
    if not np.isscalar(xa):
        xa = np.array(xa)[:, None]   
    xy = xa + np.array(ya)
    return xy

Output:

> func([1, 2], [2, 4])
array([[3, 5],
       [4, 6]])

> func(3, [2, 4])
array([5, 7])

> func([2, 4], 3)
array([[5],
       [7]])
jChoi
  • 151
  • 6
-1

I would do the following:

def func( xa, ya ):
    xalen = xa if type(xa) is not list else len(xa)
    yalen = ya if type(ya) is not list else len(ya)
    xy = np.zeros( (xalen, yalen) )    
    for j in range(yalen): 
        for i in range(xalen):
            xy[i,j] = x[i]+y[j] 
    return xy
brunodea
  • 220
  • 1
  • 10