0

I am trying to apply a simple lambda function to rows on a 2D array, but I cannot quite get it to work.

Problem as a MWE:

# Data
D = np.hstack((np.ones(5).reshape(-1,1),2*np.ones(5).reshape(-1,1)))

array([[1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.],
       [1., 2.]])

# Function
f = lambda x: x[0] + x[1]

For reasons I can only specify one lambda argument (i.e. the row of the 2D array in this case), and cannot split the array up into the respective columns i.e. writing lambda x1,x2: .... Which it to say that I have to index the columns of the input argument rather than pass these as individual arguments.

Now when I do the following:

f(D)

I hope to receive back:

array([[3],
       [3],
       [3],
       [3],
       [3]])

Sadly this is not what is happening. Instead I get:

array([2., 4.])

I.e. the function is being applied to each column individually, and even then, not to the whole input array either it would seem, just one row.

Help would be most appreciated.

Thank you

Astrid
  • 1,846
  • 4
  • 26
  • 48
  • The fact that your function is created with a lambda expression seems completely unrelated to your question. Note, if you are going to assign the result of a lambda expression to a name, e.g. `f = lambda ...` then you might as well use a regular function definition statement, the *only* advantage of kinda expressions is that they are anonymous. Note, this is explicitly part of the official python style guide PEP8 – juanpa.arrivillaga Jan 11 '21 at 13:34
  • Not really. The lambda part is what I am wondering about. I can implement this trivially without lambda functions. – Astrid Jan 11 '21 at 13:39
  • How would you implement this trivially with a regular function (i.e. a function created with a regular function definition statement). How does the lambda part become relevant at all? A lambda expression simply creates *a normal function object*, there is nothing special about it. – juanpa.arrivillaga Jan 11 '21 at 13:40

3 Answers3

1

Numpy implements some operators element-wise, but not getitem (i.e. D[i]). This means that D[0] will return array([1., 2.]) rather than array([1., 1., 1., 1., 1.]).

To do the operation you want, you can use map:

np.array(list(map(f, D)))

There are other ways to do this of course - you can see a performance comparison of them here.

Jack Smith
  • 336
  • 1
  • 4
1

Applying f to a np array will add the first and second elements of that array: x[0] + x[1] x[0] is [1., 2.] and x[1] is also [1., 2.] This means that the 2 arrays are added together and hence the answer [2., 4.]

To map a function over a np array you can use python's builtin map function: np.array(list(map(f, d)))

EnderShadow8
  • 788
  • 6
  • 17
1

There is a numpy helper function for this, np.apply_along_axis:

>>> import numpy as np
>>> D = np.hstack((np.ones(5).reshape(-1,1),2*np.ones(5).reshape(-1,1)))
>>> np.apply_along_axis(lambda x: x[0] + x[1], 1, D)
array([3., 3., 3., 3., 3.])

Although you may have to re-shape the results:

>>> np.apply_along_axis(lambda x: x[0] + x[1], 1, D)[...,None]
array([[3.],
       [3.],
       [3.],
       [3.],
       [3.]])

In any case, this will not be very efficient, it will be significantly slower than using numpy vectorized operations, and so you should do something like this:

>>> D[..., [0]] + D[..., [1]]
array([[3.],
       [3.],
       [3.],
       [3.],
       [3.]])

If at all possible. Although, I wonder if this might be more efficient actually:

>>> (D[..., 0] + D[..., 1])[...,None]
array([[3.],
       [3.],
       [3.],
       [3.],
       [3.]])

EDIT:

Yea, the latter is definitely faster:

>>> arr = D.repeat(100_000, 0)
>>> from timeit import timeit
>>> timeit("arr[..., [0]] + arr[..., [1]]", setup="from __main__ import arr, np ", number=1000)
1.7564522250000323
>>> timeit("(arr[..., 0] + arr[..., 1])[..., None]", setup="from __main__ import arr, np ", number=1000)
0.5029663629999845

And of course, either of these are much faster than apply_along_axis:

>>> timeit("np.apply_along_axis(lambda x: x[0] + x[1], 1, arr)", setup="from __main__ import arr, np ", number=10)
8.547392725000009

Note, the above was repeated ten times versus the pure numpy versions which I repeated one thousand times, and it still took ten times longer, which is 3 orders of magnitude faster (pretty typical for numpy vs python). Heck, I think just dropping into Python might be faster, so using a list comprehension:

>>> timeit("np.array([x[0] + x[1] for x in arr])", setup="from __main__ import arr, np ", number=10)
2.1825029110000287

Again, note that the above is repeated only 10 times. If I do the original timings only 10 times, just to show the stark contrast directly:

>>> timeit("arr[..., [0]] + arr[..., [1]]", setup="from __main__ import arr, np ", number=10)
0.023796129999936966
>>> timeit("(arr[..., 0] + arr[..., 1])[..., None]", setup="from __main__ import arr, np ", number=10)
0.007688513000061903

So stick to (arr[..., 0] + arr[..., 1])[..., None] if you want to go many hundreds to a thousand times faster.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172