numpy lambda with output of more than one

Question

Is it possible to do the following with lambda? (I just need something which is able to do this really fast)

Asking for lambda because of this answer: https://stackoverflow.com/a/35216364/3776738

import numpy as np

def one_to_more(i):
    //do some calculations etc...
    return [i*3,i*9]

x = np.array([1, 2, 3])
f = lambda x: one_to_more(x)
more = f(x)
print(more)

---> [3,9,6,18,9,27]

EDIT: IT MUST NOT BE LAMBDA. I'm just looking for the fastest method to extend a large list or numpy array this way. This way means that it will have twice (the example code above or even longer length)

CLEANED:

this is the actual function used:

MAX_NUM=100000
def num_to_arr (num):
    num = int(num) 
    if (num < 0 or num >= MAX_NUM):
        num = 0

    num3 = (num // 1600)
    num2 = ((num - num3 * 1600) // 40)
    num1 = int((num - num3 * 1600 - num2 * 40))
    arr = [num1 / 40, num2 / 40, num3 / 40]
    return arr

used like this:

result=list(map(num_to_arr,large_array))

The large array consist of about 10k integers and the execution time is about 17ms which is much too high. (CPU IS AMD RYZEN 7950X)

I just thought lambda can do this the fastest way: https://stackoverflow.com/a/35216364/3776738 — user3776738, May 01 '23 at 20:46
A lambda is not going to be "faster", What is limiting here? Are you going to call this one time or millions of times? What is the shape of your input array? only 3 or larger? — mozway, May 01 '23 at 20:47
@user3776738 a lambda expression doesn't affect speed at all. It is syntactic sugar for creating a function object using an expression, instead of requiring a statement. It has *no bearing on performance* — juanpa.arrivillaga, May 01 '23 at 20:48
It will be used on very large arrays. I'm just looking for the fastest method to extend arrays this way. — user3776738, May 01 '23 at 20:48
@juanpa.arrivillaga I have edited the question to make it more clear. I don't think that the function can be vectorized. — user3776738, May 01 '23 at 21:35
"but the concatenation process of list or np.array is very slow," for a `list` it is not. For arrays it is. Again, you really need to clarify **what it is you are actually asking** — juanpa.arrivillaga, May 01 '23 at 21:46
It can be numpy or list, that doesn't matter. All that matters is that it gets done faster, by a lot. — user3776738, May 01 '23 at 21:56
You misunderstood that link about `lambda`. That is contrasting a direct `numpy` calculation with `np.vectorize` or a list comprehension; the lambda wrapper is just a quick way of writing a function for testing and timing. — hpaulj, May 01 '23 at 21:58
So, for example, if `result = []` then `for _ in range(1_000_000): result += [None, None]` takes `106 ms` on my relatively old machine. That is not your bottleneck — juanpa.arrivillaga, May 01 '23 at 22:04
Can you please clean up your question? There is a bunch of irrelevant stuff there. — juanpa.arrivillaga, May 01 '23 at 22:20

score 2 · Answer 1 · answered May 01 '23 at 20:43

2

You can use broadcasting and ravel:

def one_to_more(i):
    return (np.array([3, 9])*i[:,None]).ravel()

out = one_to_more(x)

Output: array([ 3, 9, 6, 18, 9, 27])

answered May 01 '23 at 20:43

mozway

194,879
13
39
75

I have edited the question to make the problem more clear. (There will be done some more stuff with i as a simple multiplication.) – user3776738 May 01 '23 at 21:37
Unfortunately, it will be difficult to help you without knowing exactly what you are doing in this function. You have two main options: 1) the code can be vectorized and you can remove any python loop, 2) the code can't be vectorized easily and you might want to parallelize. – mozway May 01 '23 at 21:49
@mozway eh, I would reach for `numba` first – juanpa.arrivillaga May 01 '23 at 22:06
@juanpa.arrivillaga yes true – mozway May 01 '23 at 22:25

juanpa.arrivillaga · Accepted Answer · 2023-05-02T05:59:29.063

OK, thank you for providing your actual question. I created a version of your function which works on the entire array and uses some vectorization (although, it isn't fully vectorized). This provided pretty good improvement, though:

In [5]: MAX_NUM = 5000

In [6]: def num_to_arr (num):
    ...:     num = int(num)
    ...:     if (num < 0 or num >= MAX_NUM):
    ...:         num = 0
    ...:
    ...:     num3 = (num // 1600)
    ...:     num2 = ((num - num3 * 1600) // 40)
    ...:     num1 = int((num - num3 * 1600 - num2 * 40))
    ...:     arr = [num1 / 40, num2 / 40, num3 / 40]
    ...:     return arr
    ...:

In [7]: def num_to_arr_vec(arr):
    ...:     arr[(arr < 0) | (arr >= MAX_NUM)] = 0
    ...:     result = np.zeros((len(arr), 3))
    ...:     result[:,2] = arr//1600
    ...:     result[:,1] = ((arr - result[:,2] * 1600) // 40)
    ...:     result[:,0] = (arr - result[:,2] * 1600 - result[:,1] * 40)
    ...:     return result / 40
    ...:

In [8]: arr = np.random.randint(-10_000, 10_000, 10_000)

In [9]: %timeit list(map(num_to_arr, arr))
6.72 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit num_to_arr_vec(arr)
333 µs ± 6.46 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

So it's about 20x faster. I suspect to do better, you would want to use numba here and write a very imperative loop

EDIT:

Add numba example:

In [51]: import numba

In [52]: @numba.jit(numba.float64[:,:](numba.int64[:]), nopython=True)
    ...: def num_to_arr_numba(arr):
    ...:     result = np.empty((len(arr), 3))
    ...:     for i in range(len(arr)):
    ...:         num = arr[i]
    ...:         if num < 0 or num >= MAX_NUM:
    ...:             num = 0
    ...:         num3 = num // 1600
    ...:         num2 = ((num - num3 * 1600) // 40)
    ...:         num1 = (num - num3 * 1600 - num2 * 40)
    ...:         result[i, 0] = num1
    ...:         result[i, 1] = num2
    ...:         result[i, 2] = num3
    ...:     result /= 40
    ...:     return result
    ...:

In [53]: %timeit num_to_arr_numba(arr)
85.7 µs ± 2.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

So this seems to bring us to about an 80x faster.

That's a real nice improvement. Thank you. 20x is good. But I will look into numba as it is not enough for my needs. — user3776738, May 01 '23 at 22:37
What is this technique called, that you used, so I can do it myself for other functions? — user3776738, May 01 '23 at 22:41
@user3776738 it's just vectorization. Note, you could probably immediately improve it using [`numexpr`](https://github.com/pydata/numexpr) since this creates a lot of unnecessary intermediate arrays, and the code would be trivial to convert — juanpa.arrivillaga, May 01 '23 at 23:03
That is amazing. Can this even be combined with the vectorization method from above and get us to 100x+? — user3776738, May 02 '23 at 10:18
@user3776738 no, not at all actually. "Vectorization" is just using built-in numpy methods. Those are already running at C-speed, using numba to invoke them doesn't make them faster, and probably won't materially affect the runtime, although you can try — juanpa.arrivillaga, May 02 '23 at 16:46

numpy lambda with output of more than one

2 Answers2