3

Assume I have the following arrays:

N = 8
M = 4

a = np.zeros(M)
b = np.random.randint(M, size=N) # contains indices for a
c = np.random.rand(N) # contains random values

I want to sum the values of c according to the indices provided in b, and store them in a. Writing a loop for this is trivial:

for i, v in enumerate(b):
    a[v] += c[i]

Since N can get quite big in my real-world problem I'd like to avoid using python loops, but I can't figure out how to write it as a numpy-statement. Can anyone help me out?

Ok, here some example values:

In [27]: b
Out[27]: array([0, 1, 2, 0, 2, 3, 1, 1])

In [28]: c
Out[28]: 
array([ 0.15517108,  0.84717734,  0.86019899,  0.62413489,  0.24357903,
        0.86015187,  0.85813481,  0.7071174 ])

In [30]: a
Out[30]: array([ 0.77930596,  2.41242955,  1.10377802,  0.86015187])
fdlm
  • 614
  • 1
  • 5
  • 14

1 Answers1

3
import numpy as np

N = 8
M = 4
b = np.array([0, 1, 2, 0, 2, 3, 1, 1])
c = np.array([ 0.15517108,  0.84717734,  0.86019899,  0.62413489,  0.24357903, 0.86015187,  0.85813481,  0.7071174 ])

a = ((np.mgrid[:M,:N] == b)[0] * c).sum(axis=1)

returns

array([ 0.77930597,  2.41242955,  1.10377802,  0.86015187])
eumiro
  • 207,213
  • 34
  • 299
  • 261
  • 1
    This is a nice answer, but has relatively high memory requirement for my purposes (7,000,000 x 30,000). If anyone knows of a less memory-intensive way, I would love to know. For now, a simple loop seems like the answer for high-memory requirements. – Brian Jul 20 '16 at 02:18