Efficiently adding numpy arrays with duplicate destination indices

Question

Suppose I have two numpy arrays, A and B, which are potentially very large. I'd like to find a way to efficiently add values to certain entries in A by specifying the indices at which each entry of B should be added.

Normally, one could use the following syntax:

A[indices] += B

The problem is that this doesn't behave as I would expect in cases where indices contains duplicate values. The only solution I've found is to use a manual for-loop, but I was hoping there might be a more efficient way. For example:

A = np.array([100, 200, 300, 400])
B = np.array([1, 2, 3, 4, 5, 6])
indices = [1, 2, 0, 2, 1, 1]

for i, index in enumerate(indices):
    A[index] += B[i]

This yields A = [103, 212, 306, 400], as desired.

In contrast, A[indices] += B yields A = [103, 206, 304, 400], which suggests that the operations A[1] += 1, A[1] += 5, A[2] += 2 are being omitted.

Note: I view the desired behavior as being somewhat similar to a "group by" operation in SQL -- for each value of k, I want to group all entries of B where indices == k and add them into the kth position of A.

My question is: is there a more efficient way to perform this operation? I'm hoping there's some built-in numpy functionality which would be better-optimized for performance than my for-loop above.

For reference, I'm using numpy version 1.13.3.

Higher-dimensional case

If it's possible to generalize this to higher-dimensional arrays, I'd be interested to hear that, too. For example, is there a more efficient way to perform the following?

A = (1 + np.arange(12).reshape(3,4)) * 100
B = (1 + np.arange(18)).reshape(3,6)
row_indices = [2, 0, 2]
col_indices = [1, 2, 0, 2, 1, 1]

for i, row_index in enumerate(row_indices):
    for j, col_index in enumerate(col_indices):
        A[row_index, col_index] += B[i, j]

That higher dim case looks wrong. Shouldn't it be : `for j, col_index in enumerate(col_indices.ravel()):`? Did you run it at your end? — Divakar, Nov 20 '17 at 07:03
@Divakar -- oops, the higher-dim snippet was incorrect. Thanks for pointing that out. I've edited it and confirmed that it now works as intended. — alev, Nov 20 '17 at 07:07
For completeness, the solution for the 1d case is: `np.add.at(A, indices, B)`, and the solution for the 2d case is `np.add.at(A, (np.atleast_2d(row_indices).transpose(), np.atleast_2d(col_indices)), B)` — alev, Nov 20 '17 at 07:12

Efficiently adding numpy arrays with duplicate destination indices

Higher-dimensional case

0 Answers0