numpy: how to avoid looping over array to sum over one dimension while indexing from another array?

Question

Let's say I have 2 arrays A and B where the values in A correspond to the index in the first dimension of B

A = np.array([[ 0,  3,  4], 
              [ 5,  6,  0],
              [ 0,  1,  9],
              [11, 12, 13]])

B = np.array([[0, 1, 1],
              [1, 2, 0],
              [0, 2, 3],
              [3, 3, 3],
              [0, 3, 1],
              [3, 1, 1],
              [1, 1, 0]])

For each row j of B, I want to sum the elements in A that correspond to A[B[j, i], i], to get a new array of shape 7. I can do it with loops like this:

output_array = np.zeros(B.shape[0])
for i in range(A.shape[1]):
    for j in range(B.shape[0]):
        output_array[j] += A[B[j, i], i]

which would give me as output_array

array([ 6., 10., 14., 36., 12., 17., 15.])

Is there any fast and efficient way to do the same for large arrays without having to loop?

The closest answer I could find here was this one, but I need to actually do a sum inside one of the loop and I wasn't able to extend to my case.

How large are the actual arrays? – ken Aug 17 '23 at 08:03 — ken, Aug 17 '23 at 08:03
A would be in the range of 2000 x 500 and B 20000 x 500 – SadCactus Aug 17 '23 at 08:53 — SadCactus, Aug 17 '23 at 08:53

mozway · Answer 1 · 2023-08-17T08:07:18.090

Using fancy indexing:

out = A[B,
  np.repeat(np.arange(A.shape[1])[None],
            B.shape[0], axis=0)
 ].sum(axis=1)

Output:

array([ 6, 10, 14, 36, 12, 17, 15])

This works by using arrays indices of the shape of B to index A, thus maintaining the shape of B as output.

Intermediates:

np.repeat(np.arange(A.shape[1])[None], B.shape[0], axis=0)

array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2],
       [0, 1, 2],
       [0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])


out = A[B,
  np.repeat(np.arange(A.shape[1])[None],
            B.shape[0], axis=0)
 ]

array([[ 0,  6,  0],
       [ 5,  1,  4],
       [ 0,  1, 13],
       [11, 12, 13],
       [ 0, 12,  0],
       [11,  6,  0],
       [ 5,  6,  4]])

score 1 · Accepted Answer · answered Aug 17 '23 at 11:12

You can make it simple as follows:

def numpy_solution(A, B):
    return A[B, np.arange(A.shape[1])].sum(axis=1)

However, this creates a temporary array of the same size as B (20000, 500), which degrades performance.

To get around this, and this may seem counterintuitive, you can actually make it faster by keeping one loop.

def keep_i_loop(A, B):
    output_array = np.zeros(B.shape[0], dtype=A.dtype)
    for i in range(A.shape[1]):
        output_array += A[B[:, i], i]
    return output_array

Furthermore, in terms of speed, this is a calculation that Numba is better at.

import numba


@numba.njit("int32[:](int32[:,:],int32[:,:])")
def numba_solution(A, B):
    output_array = np.zeros(B.shape[0], dtype=A.dtype)
    # Note that I have swapped i and j loops.
    for j in range(B.shape[0]):
        for i in range(A.shape[1]):
            output_array[j] += A[B[j, i], i]
    return output_array

Here is the benchmark code and results.

import timeit
import numpy as np


def baseline(A, B):
    output_array = np.zeros(B.shape[0])
    for i in range(A.shape[1]):
        for j in range(B.shape[0]):
            output_array[j] += A[B[j, i], i]
    return output_array


def benchmark():
    rng = np.random.default_rng(0)
    A = rng.integers(0, 1000, size=(2000, 500), dtype=np.int32)
    B = rng.integers(0, 2000, size=(20000, 500), dtype=np.int32)

    expected = baseline(A, B)
    assert np.all(expected == numpy_solution(A, B))
    assert np.all(expected == keep_i_loop(A, B))
    assert np.all(expected == numba_solution(A, B))

    n = 100
    r = timeit.timeit(lambda: numpy_solution(A, B), number=n) / n
    print(f"numpy_solution: {r * 1000:.0f} ms")

    r = timeit.timeit(lambda: keep_i_loop(A, B), number=n) / n
    print(f"keep_i_loop: {r * 1000:.0f} ms")

    r = timeit.timeit(lambda: numba_solution(A, B), number=n) / n
    print(f"numba_solution: {r * 1000:.0f} ms")


if __name__ == "__main__":
    benchmark()

numpy_solution: 66 ms
keep_i_loop: 46 ms
numba_solution: 15 ms

score 0 · Answer 3 · answered Aug 17 '23 at 07:55

The use of vectorized operations and NumPy's enhanced indexing will enable you to accomplish this without using explicit loops. The way to do this is to establish a set of indices for each of the two dimensions, use those indices to access array A, and then compute the sum along that specific dimension. Here is how to go about it:

import numpy as np

A = np.array([[ 0,  3,  4],
              [ 5,  6,  0],
              [ 0,  1,  9],
              [11, 12, 13]])

B = np.array([[0, 1, 1],
              [1, 2, 0],
              [0, 2, 3],
              [3, 3, 3],
              [0, 3, 1],
              [3, 1, 1],
              [1, 1, 0]])

# Create indices for both dimensions using broadcasting
row_indices = np.arange(B.shape[0])[:, np.newaxis]
column_indices = B

# Index into A using the indices arrays
result = A[column_indices, row_indices]

# Sum along the specified dimension (axis=1)
output_array = np.sum(result, axis=1)

print(output_array)

Output:

array([ 6, 10, 14, 36, 12, 17, 15])

score 0 · Answer 4 · answered Aug 17 '23 at 07:59

# Create a column index array for A using np.arange and np.tile
column_index_array = np.tile(np.arange(A.shape[1]), B.shape[0]).reshape(B.shape)

# Create a row index array for A using np.arange and broadcasting
row_index_array = np.arange(B.shape[0])[:, None]

# Iterate B's indices using row and column indices to fetch corresponding values from A
output_array = A[B, column_index_array].sum(axis=1)

score 0 · Answer 5 · answered Aug 17 '23 at 08:07

0

With short np.trace routine to sum along diagonals of the "index" array:

out_arr = np.trace(A[B], axis1=1, axis2=2)

array([ 6, 10, 14, 36, 12, 17, 15])

answered Aug 17 '23 at 08:07

RomanPerekhrest

88,541
4
65
105

Smart idea but won't this create a humongous intermediate array? – mozway Aug 17 '23 at 08:15

numpy: how to avoid looping over array to sum over one dimension while indexing from another array?

5 Answers5