0

I am doing the kmean clustering through sklearn in python. I am wondering how to change the generated label name for kmean clusters. For example:

data          Cluster
0.2344         1
1.4537         2
2.4428         2
5.7757         3

And I want to achieve to

data          Cluster
0.2344         black
1.4537         red
2.4428         red
5.7757         blue

I am not meaning to directly set1 -> black; 2 -> redby printing. I am wondering is it possible to set different cluster names in kmean clustering model in default.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
夏思阳
  • 57
  • 1
  • 4
  • 10

2 Answers2

1

No
There isn't any way to change the default labels.
You have to map them separately using a dictionary. You can take look at all available methods in the documentation here.
None of the available methods or attributes allows you to change the default labels.

Solution using dictionary:

# Code
a = [0,0,1,1,2,2]
mapping = {0:'black', 1:'red', 2:'blue'}
a = [mapping[i] for i in a]

# Output
['black', 'black', 'red', 'red', 'blue', 'blue']

If you change your data or number of clusters: First we will see the visualizations:
Code:
Importing and generating random data:

from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

x = np.random.uniform(100, size =(10,2))

Applying Kmeans algorithm

kmeans = KMeans(n_clusters=3, random_state=0).fit(x)

Getting cluster centers

arr = kmeans.cluster_centers_

Your cluster centroids look like this:

array([[23.81072765, 77.21281171],
       [ 8.6140551 , 23.15597377],
       [93.37177176, 32.21581703]])

Here, 1st row is the centroid of cluster 0, 2nd row is centroid of cluster 1 and so on.

Visualizing centroids and data:

plt.scatter(x[:,0],x[:,1])
plt.scatter(arr[:,0], arr[:,1])

You get a graph that looks like this: My graph.

As you can see, you have access to centroids as well as training data. If your training data and number of clusters is constant these centroids dont really change.

But if you add more training data or more number of clusters then you will have to create new mapping according to the centroids that are generated.

Aniket Bote
  • 3,456
  • 3
  • 15
  • 33
  • Vote, So how could I deal with the situation if a new cluster is generated by entering more data to kmean model, since the cluster labels will be changed? Such as a = [1,2,2,3] and the output clusters may be altered to [4,2,2,3,1]. So, mapping may not work in this case – 夏思阳 Sep 02 '20 at 03:01
  • I have updated my answer according to your comment. You must remember that these labels are just for the representation purpose and in no way they are getting used in the actual algorithm. So even if you interchange the labels in your mapping dictionary that won't have an effect on the actual algorithm itself. If you have data and they have labels then you should look for supervised learning algorithm. – Aniket Bote Sep 02 '20 at 03:35
  • If this answered the question to your satisfaction consider accepting and upvoting it. See [How does accepting work?](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) for more information. – Aniket Bote Sep 03 '20 at 10:18
-1

check out the top response on this related post

sklearn doesn't include this functionality but you can map the values to your dataframe in a fairly straightforward manner.

current_labels = [1, 2, 3]
desired_labels = ['black', 'red', 'blue']
# create a dictionary for your corresponding values
map_dict = dict(zip(current_labels, desired_labels))
map_dict
>>> {1: 'black', 2: 'red', 3: 'blue'}

# map the desired values back to the dataframe
# note this will replace the original values
data['Cluster'] = data['Cluster'].map(map_dict)

# alternatively you can map to a new column if you want to preserve the old values
data['NewNames'] = data['Cluster'].map(map_dict)
gojandrooo
  • 168
  • 2
  • 5
  • you are considering the fact that the current labels are correct, this should be an answer related to the mapping of KNN labels. – Harshdeep Singh Jul 13 '23 at 13:44