0

I'm trying to speed up a process, I think this might be possible using numpy's apply_along_axis. The problem is that not all my axis have the same length.

When I do:

a = np.array([[1, 2, 3], 
              [2, 3, 4], 
              [4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)

This works fine. But I would like to do something similar to (please note that the first row has 4 elements and the rest have 3):

a = np.array([[1, 2, 3, 4], 
              [2, 3, 4], 
              [4, 5, 6]])
b = np.apply_along_axis(sum, 1, a)
print(b)

But this fails because:

numpy.AxisError: axis 1 is out of bounds for array of dimension 1

I've looked around and the only 'solution' I've found is to add zeros to make all the arrays the same length, which would probably defeat the purpose of performance improvement.

Is there any way to use numpy_apply_along_axis on a non-regular shaped numpy array?

Joe
  • 6,758
  • 2
  • 26
  • 47
Nathan
  • 3,558
  • 1
  • 18
  • 38
  • I'm creating this numpy array from a pandas groupby object, using `.apply(np.array)`, if there's any way to do so but adding 0s to make all numpy arrays the same length, that may also work to solve my problem – Nathan Sep 26 '19 at 09:15
  • Behind the scenes `np.apply_along_axis()` is only a for-loop. It is merely a convenience function for stuff that really can't be vectorized. So don't expect a speed-up from this. What function are you planning to use? Maybe there is a better solution. – Joe Sep 26 '19 at 10:41
  • @Joe I want to use a numpy.diff on each of the rows and then a mode on a moving window – Nathan Sep 26 '19 at 13:07
  • Do you know the size of the vectors before or are you appending to a list? see e.g. https://stackoverflow.com/a/58085045/7919597 – Joe Sep 26 '19 at 16:15

2 Answers2

1

You can transform your initial array of iterable-objects to ndarray by padding them with zeros in a vectorized manner:

import numpy as np

a = np.array([[1, 2, 3, 4], 
              [2, 3, 4], 
              [4, 5, 6]])
max_len = len(max(a, key = lambda x: len(x))) # max length of iterable-objects contained in array
cust_func = np.vectorize(pyfunc=lambda x: np.pad(array=x, 
                                                 pad_width=(0,max_len), 
                                                 mode='constant', 
                                                 constant_values=(0,0))[:max_len], otypes=[list])
a_pad = np.stack(cust_func(a))

output:

array([[1, 2, 3, 4],
       [2, 3, 4, 0],
       [4, 5, 6, 0]])
Eduard Ilyasov
  • 3,268
  • 2
  • 20
  • 18
  • 1
    Hello @Nathan! It's ok, no problem) I found on SO a more efficient solution than mine to solve a task with filling different-sized iterable-objects of array https://stackoverflow.com/a/32043366/5107488. I would be glad if this helps someone in the future. – Eduard Ilyasov Sep 26 '19 at 16:20
1

It depends. Do you know the size of the vectors before or are you appending to a list?

see e.g. http://stackoverflow.com/a/58085045/7919597

You could for example pad the arrays

import numpy as np

a1 = [1, 2, 3, 4]
a2 = [2, 3, 4, np.nan] # pad with nan
a3 = [4, 5, 6, np.nan] # pad with nan

b = np.stack([a1, a2, a3], axis=0)

print(b)

# you can apply the normal numpy operations on 
# arrays with nan, they usually just result in a nan
# in a resulting array
c = np.diff(b, axis=-1)

print(c)

Afterwards you can apply a moving window on each row over the columns.

Have a look at https://stackoverflow.com/a/22621523/7919597 which is only 1d, but can give you an idea of how it could work.

It is possible to use a 2d array with only one row as kernel (shape e.g. (1, 3)) with scipy.signal.convolve2d and use the idea above. This is a workaround to get a "row-wise 1D convolution":

from scipy import signal

krnl = np.array([[0, 1, 0]])

d = signal.convolve2d(c, krnl, mode='same')
print(d)
Joe
  • 6,758
  • 2
  • 26
  • 47
  • Thanks for the answer. I ended up using list comprehension which was fast enough. I greatly appreciate your effort and I have upvoted your answer because it is useful. However, I won't accept it as correct because it's not the answer to the question posed. – Nathan Sep 27 '19 at 07:59