1

Suppose that we train a self-organising map (SOM) with a given dataset. Would it make sense to cluster the neurons of the SOM instead of the original datapoints? This doubt came to me after reading this paper, in which the following is stated:

The most important benefit of this procedure is that computational load decreases considerably, making it possible to cluster large data sets and to consider several different preprocessing strategies in a limited time. Naturally, the approach is valid only if the clusters found using the SOM are similar to those of the original data.

In this answer it is clearly stated that SOMs don't include clustering, but some clustering procedure can be made on the SOM after it has been trained. I thought that this meant the clustering was done on the neurons of the SOM, which are in some sense a mapping of the original data, but I'm not sure about this. So, what I want to know is:

  • Is it correct to cluster data performing the clustering algorithm on the trained neuron weights as datapoints? If not, how is clustering done using a SOM then?
  • What characteristics should a dataset have, in general, for this approach to be useful?
Tendero
  • 1,136
  • 2
  • 19
  • 34

1 Answers1

1

Yes, the usual approach seems to be either hierarchical or k-means (you'll need to dig this up how it was originally done - as seen in the paper you linked, many variants including two-level approaches have been explored later) on the neurons. If you consider SOMs to be a quantization and projection technique, all of these approaches are valid to use.

It's cheaper because they are just 2 dimensional, Euclidean, and much fewer points. So that is well in line with the source that you have.

Note that a SOM neuron may be empty, it it is inbetween of two extremely well separated clusters.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thanks for your response. I'm having a hard time understanding your last two paragraphs. 1) Suppose that the original datapoints have dimension `N`. Then, each neuron will be represented by a weight vector of dimension `N` as well. So... why do you say that they are just 2D? I know that the grid is in a 2D form, but the dimensions are not reduced when it comes to clustering, or do they? 2) What do you mean by "a SOM neuron may be empty"? – Tendero Sep 03 '18 at 13:06
  • The clustering may even be on the 2d grid coordinates, and you weight neurons by the amount of points closest to them, which may be 0. – Has QUIT--Anony-Mousse Sep 03 '18 at 19:09
  • I got the empty neuron thing. Nevertheless, I'm not seeing how one could use the 2D positions of the neurons to help with the clustering. Sorry if this is too basic, but could you provide more information on this? – Tendero Sep 03 '18 at 19:57
  • That is also how the nice SOM plots work. In 2d, not the original coordinates. Umatrix etc. – Has QUIT--Anony-Mousse Sep 03 '18 at 21:31