I'm trying to find a pythonic/numpy way to summate 2d datapoints based on the labels they were given.
Given the following data matrix X
:
X = np.array(
[
[6, 1], # row_0
[4, 4], # row_1
[8, 4], # row_2
[6, 3], # row_..
[5, 8],
[7, 9] # row_5
]
)
And the labels assigned to it:
labels = np.array([1, 0, 2, 1, 2, 0])
It means that row_0
is assigned the label 1, row_1
label 0, row_2
: 2 etc.
Right now i'm trying to summate every datapoint per label using the following loop:
cum_sum = np.zeros((3, 2))
for i, label in enumerate(labels):
cum_sum[label] += X[i]
Which results in the following matrix
[[11. 13.]
[12. 4.]
[13. 12.]]
However i was wondering if there was a more pythonic/efficient way to solve this question. It has been done using 1D arrays as shown in this SO post.
How would one solve this?
Thanks in advance!
If the question was unclear, please comment.