1

In Python 3, using numpy, I have an array of values and an array of indexes. I need to find the sum (or mean) of the value array at each index in the index array, for each value in the index array. For speed reasons I am looking to do this using numpy slicing instead of for loops.

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],)
                           [2  ,   5,  10, 20]])
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]])
# Something like desired_sum = numpy.sum(value_array[index_array])
# Where the output will be
desired_sum  = numpy.array([5.1, 10.7, 3, 0, 20])

This will be running on larger arrays {~shape = (2000, 2000)} with a few hundred indexes.

wjandrea
  • 28,235
  • 9
  • 60
  • 81

2 Answers2

0

This is sort of a duplicate of this question. The code for Python 3 is:

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],
                           [2  ,   5,  10, 20]])
# index_array must be integers an have no negative values
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]]).astype(int)
sum_array     = numpy.bincount(index_array.flatten(), weights = value_array.flatten())
average_array = sum_array / numpy.bincount(index_array.flatten())

print('sum_array = ' + repr(sum_array))
print('average_array = ' + repr(average_array))

The output is

sum_array = array([ 5.1, 10.7, 3. , 0. , 20. ])
average_array = array([ 2.55, 3.56666667, 1.5, nan, 20. ])
0

You could broadcast the indexes and value matrix over the range of resulting positions, adding one more dimension to form multiple masks of the value matrix.

For example, values would give 5 copies of the 2d matrix:

array([[[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]],

       [[ 0.1,  0.2,  0.5,  1. ],
        [ 2. ,  5. , 10. , 20. ]]])

For each of these copies, keep only the values that correspond to the index in the corresponding index_array (using broadcasting over the 0..4 indexes of the output). Given that numpy only processes fixes sized dimensions, selecting/excluding values will be done by multiplying by 1 or 0 (boolean of index comparisons):

import numpy
value_array = numpy.array([[0.1, 0.2, 0.5, 1 ],
                           [2  ,   5,  10, 20]])
index_array = numpy.array([[  0,   1,   1, 2 ],
                           [  2,   0,   1, 4 ]])

result_index = numpy.arange(5)[:,None,None]

eligible_values = numpy.tile(value_array,(5,1)).reshape(-1,*value_array.shape)
eligible_values *= result_index==index_array
result = numpy.sum(eligible_values,axis=(1,2))

print(result)
[ 5.1 10.7  3.   0.  20. ]

To get averages, you can compute the counts in a similar fashion (you only need to use the indexes for those):

counts = numpy.sum(result_index==index_array,axis=(1,2))

print(counts)
[2 3 2 0 1]

print(result/counts)  # averages
[ 2.55        3.56666667  1.5                nan 20.        ]
Alain T.
  • 40,517
  • 4
  • 31
  • 51