1

I have scaled my original data X1:

scaler = StandardScaler()
X1_scaled = pd.DataFrame(scaler.fit_transform(X1),columns = X1.columns)

and then performed k-means clustering:

kmeans = KMeans(
        init="random",
        n_clusters=3,
        n_init=10,
        max_iter=300,
        random_state=123)   
X1['label'] = kmeans.fit_predict(X1_scaled[['Wn', 'LL']])

# get centroids
centroids = kmeans.cluster_centers_
cen_x = [i[0] for i in centroids] 
cen_y = [i[1] for i in centroids]  

                                        

Now, I would like to plot the original data(X1) and the centroids, but the centroids are scaled, so when I plot the results:

g = sns.scatterplot(x=X1.Wn, y= X1.LL, hue=X1.label,
              data=X1, palette='colorblind', 
                   legend='full')
g = sns.scatterplot(cen_x,cen_y,s=80,color='black')

the centroids is outside the clusters. How can I plot the original data, with the groups and the centroids?

this is the image I got:

enter image description here

and this is what I would like to have, but with the original data and not the scaled data:

enter image description here

JCV
  • 447
  • 1
  • 5
  • 15
  • 1
    Does this answer your question? [scikit-learn: how to scale back the 'y' predicted result](https://stackoverflow.com/questions/38058774/scikit-learn-how-to-scale-back-the-y-predicted-result) – JohanC Nov 08 '21 at 17:33
  • @JohanC no, because I need the value of the centroids in original scale, and not my original X (this one I have, because I keep it and create a new one, X_scaled). – JCV Nov 08 '21 at 17:44

1 Answers1

2

You can call scaler.inverse_transform() on the centroids. (Note that sns.scatterplot is an axes-level function and returns an ax, not a FacetGrid.)

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

X1 = pd.DataFrame({'Wn': np.random.rand(30) * 12, 'LL': np.random.rand(30) * 6})

scaler = StandardScaler()
X1_scaled = pd.DataFrame(scaler.fit_transform(X1), columns=X1.columns)

kmeans = KMeans(init="random",
                n_clusters=3,
                n_init=10,
                max_iter=300,
                random_state=123)
X1['label'] = kmeans.fit_predict(X1_scaled[['Wn', 'LL']])

# get centroids
centroids = scaler.inverse_transform(kmeans.cluster_centers_)
cen_x = [i[0] for i in centroids]
cen_y = [i[1] for i in centroids]

ax = sns.scatterplot(x='Wn', y='LL', hue='label',
                     data=X1, palette='colorblind',
                     legend='full')
sns.scatterplot(x=cen_x, y=cen_y, s=80, color='black', ax=ax)

plt.tight_layout()
plt.show()

inverse transform on the centroids

JohanC
  • 71,591
  • 8
  • 33
  • 66