how to split numpy array and perform certain actions on split arrays [Python]

Question

Only part of this question has been asked before ([1][2]) , which explained how to split numpy arrays. I am quite new in Python. I have an array containing 262144 items and want to split it in small arrays of a length of 512, sort them individually and sum up their first five values but I am unsure how beyond this line :

np.array_split(vector, 512)

How do I call and analyse each array ? Would it be good idea to continue to use numpy array or should I revert back and use dictionary instead ?

Divakar · Accepted Answer · 2017-01-29T11:44:16.263

Splitting as such won't be an efficient solution, instead we could reshape, which effectively creates subarrays as rows of a 2D array. These would be views into the input array, so no additional memory requirement there. Then, we would get argsort indices and select first five indices per row and finally sum those up for the desired output.

Thus, we would have an implementation like so -

N = 512 # Number of elements in each split array
M = 5   # Number of elements in each subarray for sorting and summing

b = a.reshape(-1,N)
out = b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)

Step-by-step sample run -

In [217]: a   # Input array
Out[217]: array([45, 19, 71, 53, 20, 33, 31, 20, 41, 19, 38, 31, 86, 34])

In [218]: N = 7 # 512 for original case, 7 for sample

In [219]: M = 5

# Reshape into M rows 2D array
In [220]: b = a.reshape(-1,N)

In [224]: b
Out[224]: 
array([[45, 19, 71, 53, 20, 33, 31],
       [20, 41, 19, 38, 31, 86, 34]])

# Get argsort indices per row
In [225]: b.argsort(1)
Out[225]: 
array([[1, 4, 6, 5, 0, 3, 2],
       [2, 0, 4, 6, 3, 1, 5]])

# Select first M ones
In [226]: b.argsort(1)[:,:M]
Out[226]: 
array([[1, 4, 6, 5, 0],
       [2, 0, 4, 6, 3]])

# Use fancy-indexing to select those M ones per row
In [227]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]]
Out[227]: 
array([[19, 20, 31, 33, 45],
       [19, 20, 31, 34, 38]])

# Finally sum along each row
In [228]: b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
Out[228]: array([148, 142])

Performance boost with np.argpartition -

out = b[np.arange(b.shape[0])[:,None], np.argpartition(b,M,axis=1)[:,:M]].sum(1)

Runtime test -

In [236]: a = np.random.randint(11,99,(512*512))

In [237]: N = 512

In [238]: M = 5

In [239]: b = a.reshape(-1,N)

In [240]: %timeit b[np.arange(b.shape[0])[:,None], b.argsort(1)[:,:M]].sum(1)
100 loops, best of 3: 14.2 ms per loop

In [241]: %timeit b[np.arange(b.shape[0])[:,None], \
                np.argpartition(b,M,axis=1)[:,:M]].sum(1)
100 loops, best of 3: 3.57 ms per loop

Was thinking about something like this but using index slices, this is much better! — meow, Jul 05 '19 at 15:30

ppasler · Answer 2 · 2017-01-29T12:32:01.293

3

A more detailed version of doing what you want

import numpy as np
from numpy.testing.utils import assert_array_equal

vector = np.random.rand(262144)

splits = np.array_split(vector, 512)

sums = []
for split in splits:
   # sort it
   split.sort()
   # sublist
   subSplit = split[:5]
   #build sum
   splitSum = sum(subSplit)
   # add to new list
   sums.append(splitSum)

print np.array(sums).shape

Same output as @Divakar 's solution

edited Jan 29 '17 at 12:32

answered Jan 29 '17 at 11:34

ppasler

3,579
5
31
51

Python is a language of one-liners, so @Divakar answer is the better one :) – ppasler Jan 29 '17 at 12:00
2

Why turn the split list into an array? Just loop on the list. – hpaulj Jan 29 '17 at 12:28

how to split numpy array and perform certain actions on split arrays [Python]

2 Answers2

Linked