0

Given the code example below, one produces an expected result and the other gives an error. Seems confusing for a beginner (i.e. me). I assume the arithmetic operations work element wise but others don't. What's a "good" (i.e. efficient) generalize way to simply perform operations on elements of a multi-dimensional array without having some underlying knowledge of the array behavior?

import numpy as np

data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(data)

my_function = lambda x: x*2+5

result = my_function(data)
print(result)

Output: [[1 2 3 4] [5 6 7 8]] [[ 7 9 11 13] [15 17 19 21]]

import numpy as np

data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(data)

my_function = lambda x: x if x < 3 else 0

result = my_function(data)
print(result)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Edit: I am not looking for a particular solution. Yes, I can use np.where or some other mechanisms for this exact example. I am asking about lambdas in particular and how their use seems ambiguous to the user. If it helps, the lamba / filter is coming from command line/outside of module. So it can be anything the user wants to transform the original array to - easy as square elements, or call an API and then use its output to determine the replacement value. You get the idea.

Running python 3.9.13

Heems
  • 142
  • 1
  • 9
  • What is the expected output for the second function? – Michael Szczesny Jun 16 '22 at 17:57
  • [[ 1 2 0 0] [0 0 0 0]] – Heems Jun 16 '22 at 17:58
  • 1
    Please clarify the question but it's a multi-dupe anyway. `np.where(data < 3, data, 0)`. – Michael Szczesny Jun 16 '22 at 17:59
  • Does this answer your question? [Replacing Numpy elements if condition is met](https://stackoverflow.com/questions/19766757/replacing-numpy-elements-if-condition-is-met) – Michael Szczesny Jun 16 '22 at 18:00
  • this has nothing to do with lambda expressions specifically, indeed, you shouldn't even be using a lambda expression in this case (according to Python style conventions) – juanpa.arrivillaga Jun 16 '22 at 18:06
  • There is no secret `numpy` syntax for fast auto-vectorization of custom python functions. `jax`'s `vmap` or `numba`'s `@guvectorize` is what's closest at the moment. – Michael Szczesny Jun 16 '22 at 18:08
  • I clarified my ask. I am looking for the ability to perform element wise operations on a n-dim array where the operation is not known to the author of the code. Maybe Python can't do what a functional programmer wants? – Heems Jun 16 '22 at 18:09
  • @Heems "I am asking about lambdas in particular and how their use seems ambiguous to the user." lambdas aren't a thing, in particular. There are *lambda expressions* which create *function objects*, the same exact type of objects that *function definition statements create*. So you are just asking about *functions*, not "lambdas". – juanpa.arrivillaga Jun 16 '22 at 18:14
  • 1
    In *any case*, this seems to have been addressed in the previous comment, there is no generic way to apply a function to every element in a `numpy.ndarray` in a way that is *efficient and vectorized*. You can use the `numpy.vectorize` factory function, but that is essentially a for-loop under the hood, and is meant for convenience not performance – juanpa.arrivillaga Jun 16 '22 at 18:15
  • 1
    `np.vectorize(my_function)(data)`? If you don't care about performance. – Michael Szczesny Jun 16 '22 at 18:15
  • BTW: The lack of auto-vectorization is not python's fault as it doesn't even support any multidimensional data structure that could use such a feature. – Michael Szczesny Jun 16 '22 at 18:24

1 Answers1

3

This works because operators like * and + work element-wise for numpy arrays:

In [101]: data = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
     ...: print(data)
     ...: 
     ...: my_function = lambda x: x*2+5
     ...: 
     ...: result = my_function(data)
[[1 2 3 4]
 [5 6 7 8]]

my_function = lambda x: x if x < 3 else 0 fails because if x<3 is inherently a scalar operation. if/else does not iterate; it expects a simple True/False value

In [103]: data<3
Out[103]: 
array([[ True,  True, False, False],
       [False, False, False, False]])

np.vectorize is the most general tool for applying an array (or arrays) element-wise to a scalar function:

In [104]: f = np.vectorize(my_function, otypes=[int])

In [105]: f(data)
Out[105]: 
array([[1, 2, 0, 0],
       [0, 0, 0, 0]])

I included the otypes parameter to avoid one of the more common vectorize faults that SO ask about.

np.vectorize is slower than plain iteration for small cases, but becomes competative with large ones. But it's main advantage is that it's simpler to use for multidimensional arrays. It's even better when the function takes several inputs, and you want to take advantage of broadcasting.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thank you for the explanation. Aside from the toy data example above, I am actually dealing with rather large 2D arrays - for example shape = (16800, 43200). The vectorize method takes a very long time, but simply doing np.where(data < 3, data, 0) is wicked quick on these large array. What is np.where doing that's so much more efficient for this non-arithmetic operation? – Heems Jun 16 '22 at 19:10
  • Are you familiar with masked indexing? `arr[mask]= a`, `arr[~mask]=b`? – hpaulj Jun 16 '22 at 19:36