2

I am a bit confused about Clustering e.g. K-means clustering. I have already created clusters for the training for and in the testing part I want to know if the new points are already in the clusters or if they can be in the cluster or not? My idea is to find the center of each cluster and also find the farthest point in each cluster in training data then in testing part if the distance of the new point is great than a threshold (e.g. 1.5x the farthest point) then it cannot be in the cluster!

Is this idea efficient and correct and is there any python function to do this?

One more question: Could someone help me to understand the difference between kmeans.fit() and kmeans.predict()? I get the same result in both functions!!

I appreciate any help

sws
  • 39
  • 1
  • 6
  • You using scikit-learn library? – Farseer Nov 17 '15 at 09:09
  • @Farseer yes. Basically I have plenty of raw data in txt format. I have converted them into numpy array format I used "sklearn.cluster import KMeans" and so on... – sws Nov 17 '15 at 09:15

1 Answers1

1

In general, when you fitting K-means algorithm, you will get cluster centers as result.

So, if you want to test to what cluster new point belong, you must calculate distance between each cluster center to the point, and label point as closest cluster center label.

If you usning scikit-learn library

Predict(X) method predicts the closest cluster each sample in X belongs to.

Fit(X) - fitting the data, or in other words calculating the cluster centers.

Here is nice example how to use K-means in scikit-learn

Farseer
  • 4,036
  • 3
  • 42
  • 61
  • For those who want to measure the distance they can use [euclidean distance](http://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy) – sws Nov 17 '15 at 10:26
  • i was trying to make a project,where i made clusters using K-means.Now i want to use those clusters as train data and when i put a test data,the output should show to which cluster the new data belong to. How to do it? P.S- I did the clustering part. I just want to know how to do the test and train part. –  Feb 08 '20 at 14:17
  • In general point can belong to the cluster and far further from it center that from some center of a different cluster. – shabunc Apr 04 '22 at 10:20