Suppose I have two numpy arrays, A
and B
, which are potentially very large. I'd like to find a way to efficiently add values to certain entries in A
by specifying the indices at which each entry of B
should be added.
Normally, one could use the following syntax:
A[indices] += B
The problem is that this doesn't behave as I would expect in cases where indices
contains duplicate values. The only solution I've found is to use a manual for-loop, but I was hoping there might be a more efficient way. For example:
A = np.array([100, 200, 300, 400])
B = np.array([1, 2, 3, 4, 5, 6])
indices = [1, 2, 0, 2, 1, 1]
for i, index in enumerate(indices):
A[index] += B[i]
This yields A = [103, 212, 306, 400]
, as desired.
In contrast, A[indices] += B
yields A = [103, 206, 304, 400]
, which suggests that the operations A[1] += 1
, A[1] += 5
, A[2] += 2
are being omitted.
Note: I view the desired behavior as being somewhat similar to a "group by" operation in SQL -- for each value of k
, I want to group all entries of B
where indices == k
and add them into the k
th position of A
.
My question is: is there a more efficient way to perform this operation? I'm hoping there's some built-in numpy functionality which would be better-optimized for performance than my for-loop above.
For reference, I'm using numpy version 1.13.3.
Higher-dimensional case
If it's possible to generalize this to higher-dimensional arrays, I'd be interested to hear that, too. For example, is there a more efficient way to perform the following?
A = (1 + np.arange(12).reshape(3,4)) * 100
B = (1 + np.arange(18)).reshape(3,6)
row_indices = [2, 0, 2]
col_indices = [1, 2, 0, 2, 1, 1]
for i, row_index in enumerate(row_indices):
for j, col_index in enumerate(col_indices):
A[row_index, col_index] += B[i, j]