Speeding up fancy indexing with numpy

Question

I have two numpy arrays and each has shape of (10000,10000). One is value array and the other one is index array.

Value=np.random.rand(10000,10000)
Index=np.random.randint(0,1000,(10000,10000))

I want to make a list (or 1D numpy array) by summing all the "Value array" referring the "Index array". For example, for each index i, finding matching array index and giving it to value array as argument

for i in range(1000):
    NewArray[i] = np.sum(Value[np.where(Index==i)])

However, This is too slow since I have to do this loop through 300,000 arrays. I tried to come up with some logical indexing method like

NewArray[Index] += Value[Index]

But it didn't work. The next thing I tried is using dictionary

for k, v in list(zip(Index.flatten(),Value.flatten())):
    NewDict[k].append(v)

and

for i in NewDict:
    NewDict[i] = np.sum(NewDict[i])

But it was slow too

Is there any smart way to speed up?

c-wilson · Accepted Answer · 2018-02-07T19:59:10.017

I had two thoughts. First, try masking, it speeds this up by about 4x:

for i in range(1000):
    NewArray[i] = np.sum(Value[Index==i])

Alternately, you can sort your arrays to put the values you're adding together in contiguous memory space. Masking or using where() has to gather all your values together each time you call sum on the slice. By front-loading this gathering, you might be able to speed things up considerably:

# flatten your arrays
vals = Value.ravel()
inds = Index.ravel()
s = np.argsort(inds)  # these are the indices that will sort your Index array

v_sorted = vals[s].copy()  # the copy here orders the values in memory instead of just providing a view
i_sorted = inds[s].copy()
searches = np.searchsorted(i_sorted, np.arange(0, i_sorted[-1] + 2)) # 1 greater than your max, this gives you your array end...
for i in range(len(searches) -1):
    st = searches[i]
    nd = searches[i+1]
    NewArray[i] = v_sorted[st:nd].sum()

This method takes 26 sec on my computer vs 400 using the old way. Good luck. If you want to read more about contiguous memory and performance check this discussion out.

Great! One thing that is curious is that actually removing `np.where` is not 4x faster in my case. it is almost the same fast (inserting `np.where` gives slightly better performance). Anyway, the first code in the answer takes about 90 sec and the second one is about 13 seconds. Thank you for great improvement. But it's still too slow, I have to consider parallelization. — Ji woong Yu, Feb 08 '18 at 07:22
Hey, you might want to check out the numba package to parallelize easily. — c-wilson, Feb 08 '18 at 11:27

Speeding up fancy indexing with numpy

1 Answers1