How to summate elements of a 2d-array based on labels using Numpy

Asked Oct 18 '22 at 18:48

Active Oct 18 '22 at 19:29

Viewed 49 times

I'm trying to find a pythonic/numpy way to summate 2d datapoints based on the labels they were given.

Given the following data matrix X:

X = np.array(
    [
        [6, 1], # row_0
        [4, 4], # row_1
        [8, 4], # row_2
        [6, 3], # row_..
        [5, 8],
        [7, 9]  # row_5
    ]
)

And the labels assigned to it:

labels = np.array([1, 0, 2, 1, 2, 0])

It means that row_0 is assigned the label 1, row_1 label 0, row_2: 2 etc.

Right now i'm trying to summate every datapoint per label using the following loop:

cum_sum = np.zeros((3, 2))
for i, label in enumerate(labels):
    cum_sum[label] += X[i]

Which results in the following matrix

[[11. 13.]
 [12.  4.]
 [13. 12.]]

However i was wondering if there was a more pythonic/efficient way to solve this question. It has been done using 1D arrays as shown in this SO post.

How would one solve this?

Thanks in advance!

If the question was unclear, please comment.

edited Oct 18 '22 at 19:29

asked Oct 18 '22 at 18:48

Chiel

1,324
1
11
30

3

`np.bincount((labels[:,None] + [0, np.max(labels)+1]).ravel(), X.ravel()).reshape(-1, 2, order='F')`, I doubt this is faster for large arrays. – Michael Szczesny Oct 18 '22 at 19:20
Only ~100x faster for the example data *1000. I suspect `numba` to be much faster. – Michael Szczesny Oct 18 '22 at 19:36

How to summate elements of a 2d-array based on labels using Numpy

0 Answers0