0

For loops and list comprehensions are slow in Numpy so they are avoided. However, I have a tuple of 2d Numpy arrays with varying number of columns so that it cannot be converted into a 3d array for manipulation by slicing. Is it possible to access the elements in the tuple without a for loop or list comprehension?

Example problem showing my list comprehension approach (note myfunc may not be trivial as shown as in this case)

def myfunc(in_array):
    return sum(in_array)

a1 = np.array(
    [[1, 2],
     [3, 4]]
)
a2 = np.array(
    [[5, 6],
     [7, 8],
     [9, 10]]
)

data = (a1, a2)
ans = [myfunc(a) for a in data]
print(ans)

Output:

[array([4, 6]), array([21, 24])]
Daniel Poh
  • 129
  • 1
  • 10
  • This may help: [most-efficient-way-to-map-function-over-numpy-array](https://stackoverflow.com/questions/35215161/most-efficient-way-to-map-function-over-numpy-array) – MrNobody33 Jul 18 '20 at 09:22
  • 1
    try this `ans = list(map(lambda x: myfunc(x), data))` – badhusha muhammed Jul 18 '20 at 09:23
  • @badhusha muhammed Thanks! That's what I'm looking for! – Daniel Poh Jul 18 '20 at 09:26
  • 1
    @DanielPoh I don't think you are actually looking for that. The difference in performance between the list comprehension and `map` should be negligible. – Georgy Jul 18 '20 at 09:34
  • @Georgy that is true, but I as of now I do not know of any other alternatives which are faster – Daniel Poh Jul 18 '20 at 09:37
  • See [How do I stack vectors of different lengths in NumPy?](https://stackoverflow.com/q/14916407/7851470) Maybe you'll find something useful there. – Georgy Jul 18 '20 at 09:38
  • Why are you using `sum` when you can use `np.sum`? – Mateen Ulhaq Jul 18 '20 at 10:02
  • 1
    Your premise is "for loops and list comprehensions are slow in numpy so they are avoided", but that is only true if you *can* avoid them. If your arrays are reasonably sized, I doubt that the CPython interpreter is the bottleneck. Most of these suggestions look a bit pointless, and are likely *slower* than what you're currently doing. – Mateen Ulhaq Jul 18 '20 at 10:05
  • 1
    If you show a more realistic MCVE, it will be easier for us to suggest an optimization. Especially if you make your example data and algorithms approximately the same complexity as **close as possible to your actual problem.** – Mateen Ulhaq Jul 18 '20 at 10:06
  • 1
    "For loops and list comprehensions are slow in Numpy so they are avoided." --- this seems to be misunderstanding the crux of the issue here. Looping in Python is slow, period, it has nothing to do with numpy. OTOH if you can express your numpy structure as a single array, then the looping happens in C/Fortran code in the backend and not inside your Python program. If your example is truly just a sum, make your array 3D with the maximum dimensions and fill the extra entries with zeros. – alkasm Jul 18 '20 at 10:07
  • @MateenUlhaq this is a minimal example and I may work with larger arrays than shown. sum is used to demonstrate an easy function and hence the example function does not need to be very optimised – Daniel Poh Jul 18 '20 at 10:07
  • The function I am working with now is relatively complex (no doubt more complex ones will come) and the main stumbling block I identified is the elements in the tuples cannot be accessed it a faster way than loops and list comprehension. Ideally I want a solution that is almost as fast as numpy array slice manipulation – Daniel Poh Jul 18 '20 at 10:09
  • 1
    That is not possible in general. However for specific problems, there's generally clever ways to avoid the for loops in Python by structuring your problem a certain way. This is called *vectorizing* your code. Vectorizing code is a skill you learn, not a function you call. Like I mentioned in the sum case by padding with zeros until the sizes fit---this is *a* way to vectorize your example code. It may not be appropriate always, e.g. if 1000 vectors are (2, 2) but then one is (2, 2000000000), padding all 1000 to that huge size isn't going to be fast. Vectorization is problem specific. – alkasm Jul 18 '20 at 10:18
  • A few loops on a complex function often is optimal. Creating an array first might take more time. It's unnecessary iteration through an array that we try to avoid. – hpaulj Jul 18 '20 at 12:58

0 Answers0