Splitting as such won't be an efficient solution, instead we could reshape, which effectively creates subarrays as rows of a 2D
array. These would be views into the input array, so no additional memory requirement there. Then, we would get argsort indices and select first five indices per row and finally sum those up for the desired output.
Thus, we would have an implementation like so -
N = 512 # Number of elements in each split array
M = 5 # Number of elements in each subarray for sorting and summing
b = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Step-by-step sample run -
In [217]: a # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])
In [218]: N = 7 # 512 for original case, 7 for sample
In [219]: M = 5
# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)
In [224]: b
Out[224]:
array([[45, 19, 71, 53, 20, 33, 31],
[20, 41, 19, 38, 31, 86, 34]])
# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]:
array([[1, 4, 6, 5, 0, 3, 2],
[2, 0, 4, 6, 3, 1, 5]])
# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]:
array([[1, 4, 6, 5, 0],
[2, 0, 4, 6, 3]])
# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]:
array([[19, 20, 31, 33, 45],
[19, 20, 31, 34, 38]])
# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])
Performance boost with np.argpartition
-
out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)
Runtime test -
In [236]: a = np.random.randint(11,99,(512*512))
In [237]: N = 512
In [238]: M = 5
In [239]: b = a.reshape(-1,N)
In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loop
In [241]: %timeit b[np.arange(b.shape[0])[:,None], \
np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop