2

There is a simple operation that I need to do a few hundred times with very large arrays (thousands of elements), so I need the most efficient solution (for-loops are too slow):

I have two arrays e.g.

a = np.array([1,23,25,100,100,101])
b = np.array([1,2,2,3,4,4])

I would now like to get the sums of all elements in a for which b has the same value. i.e.

[1,48,100,201]

I could do:

#first index of each unique entry in b
u = np.unique(b,return_index=True)[1]
#split array and sum
list(map(sum, np.split(a,u[1:])))

But that's a bit slow, and it only works if the entries in b are sorted. Is there any other way of doing this?

TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29

1 Answers1

3

Try:

>>> [a[b==n].sum() for n in np.unique(b)]
[1, 48, 100, 201]

If you're open to using pandas:

>>> pd.DataFrame({"a": a, "b": b}).groupby("b").sum()["a"].tolist()
[1, 48, 100, 201]
not_speshal
  • 22,093
  • 2
  • 15
  • 30