0

I have applied Kmeans on a dataset top_feat represent the features and I have created list of lists in cluster.

I want to get which feature belongs to which cluster. But using this code, I get same values in all the clusters. Ideally I should get values as

         len(cluster[0])=249 #(I don't know the exact number)
         len(cluster[1])=1
         len(cluster[2])=1
         #..
         len(cluster[5])=1.

I have 2500 features in total. But running this code, I get length of all clusters as 2500. It is as if all clusters are getting all the features.

I have used a for loop from 0 to 2500; such that cluster[w[i]] = top_feat[i] where w[i] is the value of the label. w= kmeans.labels_

So, if w[i] == 1, it will be cluster[1].append(top_feat[i]). Here, max(w) = 6


        cluster = [[]]*((max(w)+1))
        for i in range(0,2500):
            cluster[w[i]].append(top_feat[i])
Patrick W
  • 1,485
  • 4
  • 19
  • 27

1 Answers1

1

The sublists in [[]]*((max(w)+1)) all refer to the same list, so changing one will change them all, instead of multiplying, create max(w) + 1 distinct lists with a list comprehension:

cluster = [[] for _ in range(max(w) + 1)]
DjaouadNM
  • 22,013
  • 4
  • 33
  • 55