12

Once I have collected and organized data in a SOM how do I identify clusters?

(Items are aggregated and clustered using many traits - upwards of 10)

Specifically I want to find the 'center' of the cluster - therefor giving me the 'center' node(s).

Tyler Wall
  • 3,747
  • 7
  • 37
  • 52

3 Answers3

8

You could use a relative small map and consider each node a cluster, but this is far from optimal. If you want to apply an automated cluster detection method you should definitely read

Clustering of the Self−Organizing Map

and search similar bibliography.

You could also use more sophisticated versions of SOM algorithm (multi leveled, self growing, etc).

In any case, keep in mind that the problem of finding the "correct" number of clusters doesn't have a finite solution.

pater
  • 1,211
  • 9
  • 16
  • There are clustering algorithms that do not need to know the number of clusters. You just need to use something more modern than k-means and hierarchical clustering. – Has QUIT--Anony-Mousse Oct 27 '12 at 07:27
  • what you can basically do is two steps: build and train the som. And then, cluster the SOM, either on a static number of clusters or dynamic number of clusters. – khan Feb 22 '15 at 05:27
  • 1
    @khan, what do you mean by "cluster the SOM"? I thought by training a SOM, you have clustered/ reduced dimensions of the data. Thanks – Gathide May 18 '16 at 14:21
4

As far as I can tell, SOM is primarily a data-driven dimensionality reduction and data compression method. So it won't cluster the data for you; it may actually tend to spread clusters in the projection (i.e. split them into multiple cells).

However, it may work well for some data sets to either:

  • Instead of processing the full data set, work only on the SOM nodes (weighted by the number of elements assigned to them), which should be significantly smaller
  • Instead of working in the original space, work in the lower-dimensional space that the SOM represents

And then run a regular clustering algorithm on the transformed data.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
3

Though an old question I've encountered the same issue and I've had some success implementing Estimating the Number of Clusters in Multivariate Data by Self-Organizing Maps, so I thought I'd share.

The linked algorithm uses the U-matrix to highlight the boundaries of the individual clusters and then uses an image processing algorithm called watershedding to identify the components. For this to work correctly the regions in the u-matrix are required to be concave within the resolution of your quantization (which when converted to a binary image, simply results in using a floodfill to identify the regions).

Lanting
  • 3,060
  • 12
  • 28