6

I have a data set with (labeled) clusters. I'm trying to find the centroids of each cluster (a vector that his distance is the smallest from all data points of the cluster).

I found many solutions to perform clustering and only then find the centroids, but I didn't find yet for existing ones.

Python schikit-learn is preferred. Thanks.

galmeriol
  • 461
  • 4
  • 14
sheldonzy
  • 5,505
  • 9
  • 48
  • 86
  • Have you got any code for what you have and have tried? Generally for finding the cluster centroid you just take the average of the feature vector for all examples in the cluster. Pandas-esk example `df.groupby("cluster").mean()` – Ken Syme May 14 '18 at 14:30
  • Check [this](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html). One of the attributes of `KMeans` is `cluster_centers_` – ninesalt May 14 '18 at 14:35
  • @KenSyme That is what I did at first, but my supervisor said he didn't want to do it this way. – sheldonzy May 14 '18 at 14:36
  • Please show what have you tried and where you are facing difficulties? If you are unsure about where to start, SO is not the place. [Start here](http://scikit-learn.org/stable/modules/neighbors.html#) – Vivek Kumar May 14 '18 at 14:36
  • 1
    @ninesalt I saw it, but my data is already labeled and I'm not looking to perform kmeans – sheldonzy May 14 '18 at 14:36

1 Answers1

9

Straight from the docs:

from sklearn.neighbors.nearest_centroid import NearestCentroid
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])
clf = NearestCentroid()
clf.fit(X, y)

print(clf.centroids_)
# [[-2.         -1.33333333]
#  [ 2.          1.33333333]]
sascha
  • 32,238
  • 6
  • 68
  • 110