1

I have a k-means clustered data having 15 clusters. I have found 5 clusters with highest density and put their indexes in a list. Now i want to choose these clusters and remove them from my clustered data to visiulize the result. My new visiualized kmeans object should have n_clusters= 15-5=10 clusters in the end. Here is my clustered K-Means object

enter image description here

I used the code below to create that:

kmeans2 = KMeans(n_clusters = 15,random_state=10)
kmeans2.fit(X_train)

And here, i found the clusters i should remove:

centroids = kmeans2.cluster_centers_

from sklearn.neighbors import NearestNeighbors
nn = NearestNeighbors(n_neighbors=6) # 6 is not a typo. Explanation below.
nn.fit(centroids)
neigh_dist, neigh_ind = nn.kneighbors(centroids, return_distance=True)
densities = [5/np.sum(neigh_dist[i,:]) for i in range(centroids.shape[0])]
clusters_to_remove=[]
densities2=densities.copy()
def Nmaxelements(list1, N): 
    final_list = [] 
  
    for i in range(0, N):  
        max1 = 0
          
        for j in range(len(list1)):      
            if list1[j] > max1: 
                max1 = list1[j]; 
                  
        list1.remove(max1); 

        final_list.append(max1) 
    return final_list
final_list2=Nmaxelements(densities, 5)
clusters_to_remove=[]

for i in densities2:
    for j in final_list2:
        if(i==j):
            clusters_to_remove.append(densities2.index(i))

print(clusters_to_remove)

the output is:

[2, 5, 12, 13, 14]

How can i remove these clusters to finally visiulize my kmeans object with 10 clusters?

Berke Atalay
  • 61
  • 1
  • 6
  • 2
    What exactly do you mean by removing the clusters? Removing their data points as well? And what is `centroids`? – desertnaut Dec 27 '20 at 14:28
  • I am trying to implement **Single-Stage Sampling** to the data, finding the clusters that has the highest density and remove these clusters is one of the steps to implement that. So,yes, in the final visualization there should be 10 clustered kmeans object. The points of these five clusters should be removed too. Also i referred centroids as `centroids = kmeans2.cluster_centers_` – Berke Atalay Dec 27 '20 at 14:34
  • So, essentially you ask [How to get the samples in each cluster](https://stackoverflow.com/questions/36195457/how-to-get-the-samples-in-each-cluster), so that you can subsequently remove these samples from your data (the rest of your details are actually irrelevant). – desertnaut Dec 27 '20 at 15:08

0 Answers0