1

I'm running k-means algorithm on my data, I have the output of the labels looks like this:

[0 5 8 6 1 3 3 2 2 5 5 6 1 1 3 3 1 8 8 3 3 1 1 1 1 5 2 5 1 1 7 3 6 4 3 3 8
 1 3 3 5 1 8 8 1 8 7 1 1 8 6]

This vector contains the cluster numbers for the points indexes, for example the first value is cluster no. 0 for the point index 0, and the second value of the vector means it's cluster no. 5 and the point index 1 belongs to it.

I would like to have the subsets of the clusters: like:

cluster no 0 = { its index numbers}
cluster no 1 = { its index numbers}
..
cluster no 8 = { its index numbers}

for example the vector has the first value of 5, I need to list all the indexes of this vector that has value 5, and visa versa. I would like for each value to have its own list of indexes.

so the list of Value 5 should be:

cluster 5 = [ 1,9,10,25,27....

and all the output of the other values, and eventually the output should be 8 lists.

genuis
  • 31
  • 1
  • 8

3 Answers3

1

if you are willing to use numpy this is easily done with numpy.where

cluster5, = numpy.where( array == 5 )

in 'pure' python you could do this:

cluster5 = [i for i in range(len(array)) if array[i]==5]
fetteelke
  • 138
  • 1
  • 6
0

This wil do the trick, using enumerate:

array = [0,5,8,6,1,3,3,2,2,5,5,6,1,1,3,3,1,8,8,3,3,1,1,1,1,5,2,5,1,1,7,3,6,4,3,3,8,1,3,3,5,1,8,8,1,8,7,1,1,8,6]

for j in range(9):
    print("%i: %s"%(j,[i for i,x in enumerate(array) if x == j]))
Wouter
  • 1,568
  • 7
  • 28
  • 35
0

Simple solution based on enumerate and EAFP approach.

def cluster(seq):
    out = {}
    for index, value in enumerate(seq):
        try:
            out[value].append(index)
        except KeyError:
            out[value] = [index]
    return out

data = [2, 3, 4, 4, 3, 1]
result = cluster(data)
assert result[2] == [0]
assert result[3] == [1, 4]
assert result[4] == [2, 3]
assert result[1] == [5]
Community
  • 1
  • 1
Łukasz Rogalski
  • 22,092
  • 8
  • 59
  • 93