Suppose I have two arrays:
x
which containsm
points;c
which containsm
cluster ids for each corresponding point fromx
.
I want to calculate the mean value for points which share the same id, i.e. which belong to the same cluster. I know that c
contains integers from the range [0, k)
and all the values are present in the c
.
My current solution looks like the following:
import numpy as np
np.random.seed(42)
k = 3
x = np.random.rand(100, 2)
c = np.random.randint(0, k, size=x.shape[0])
mu = np.zeros((k, 2))
for i in range(k):
mu[i] = x[c == i].mean(axis=0)
While this approach works, I'm wondering if there is a more efficient way to calculate the means in NumPy without having to use an explicit for loop?