20

How exactly is an U-matrix constructed in order to visualise a self-organizing-map? More specifically, suppose that I have an output grid of 3x3 nodes (that have already been trained), how do I construct a U-matrix from this? You can e.g. assume that the neurons (and inputs) have dimension 4.

I have found several resources on the web, but they are not clear or they are contradictory. For example, the original paper is full of typos.

nbro
  • 15,395
  • 32
  • 113
  • 196
Spacey
  • 2,941
  • 10
  • 47
  • 63

2 Answers2

25

A U-matrix is a visual representation of the distances between neurons in the input data dimension space. Namely you calculate the distance between adjacent neurons, using their trained vector. If your input dimension was 4, then each neuron in the trained map also corresponds to a 4-dimensional vector. Let's say you have a 3x3 hexagonal map.

map lattice

The U-matrix will be a 5x5 matrix with interpolated elements for each connection between two neurons like this

u-mat lattice

The {x,y} elements are the distance between neuron x and y, and the values in {x} elements are the mean of the surrounding values. For example, {4,5} = distance(4,5) and {4} = mean({1,4}, {2,4}, {4,5}, {4,7}). For the calculation of the distance you use the trained 4-dimensional vector of each neuron and the distance formula that you used for the training of the map (usually Euclidian distance). So, the values of the U-matrix are only numbers (not vectors). Then you can assign a light gray colour to the largest of these values and a dark gray to the smallest and the other values to corresponding shades of gray. You can use these colours to paint the cells of the U-matrix and have a visualized representation of the distances between neurons.

Have also a look at this web article.

nbro
  • 15,395
  • 32
  • 113
  • 196
pater
  • 1,211
  • 9
  • 16
  • 3
    +1 great explanation. Alternatively only the average values of inter-nodes distances are shown (i.e. only visualize the {x} elements). I think this was already mentioned in one of the [posts](http://stackoverflow.com/a/7033359) linked above (although in less details) – Amro Nov 30 '12 at 16:32
  • + 1000 ...Please do the human race a favor and publish a paper/blog-post with this because it has been harrowing getting a proper explanation of this. Now for some followup questions: 1) As @Amro has mentioned, the alternative is to just visualize the {x} and not have the inter-node distances as you have mentioned. What is the advantage of one over the other? 2) I know that this German author, Ulter created the U-Matrix, but _why_ does one include the inter-node distances as mentioned here? I mean, what is the reasoning behind it? 3) How did you make this diagram on the fly? Thanks so much! – Spacey Nov 30 '12 at 17:21
  • 1
    @Learnaholic: to me both conventions serve a very similar purpose; to visualize the clusters covered by the SOM nodes using a low dimensional mapping of the original features. You would expect to see areas/zones strongly connected (small inter-nodes distances) separated by regions of weak connections (large distances). There are many other possible visualizations, [this page](http://www.ifs.tuwien.ac.at/dm/somtoolbox/visualisations.html) lists a few.. – Amro Nov 30 '12 at 17:35
  • @Amro One point: Is it possible for you please update the answer to add the case for when the grid is square? For example, are my _diagonal_ neighbors in a square grid considered 'neighbors'? In other words, what would {4} be in the square case? Would it be mean({4,1}, {4,2}, {4,5}, {4,7}, {4,8})? Thanks. (I am asking about square case because I have to make this in MATLAB and I do not think I can do hexagons). – Spacey Nov 30 '12 at 18:10
  • @Learnaholic: it depends on your topology, you could have the 2D lattice 4-connected (up/down/left/right) or 8-connected (in all eights directions), the latter is often used. By the way, hexagonal layout can be viewed as a regular grid with every other column shifted by half unit (see [this post](http://stackoverflow.com/a/2340216)) – Amro Nov 30 '12 at 18:17
  • @Amro Thanks so much for your help. That link was very useful. Of course, the display in MATLAB would still be as a rectangular grid, no? – Spacey Nov 30 '12 at 18:33
  • 1
    @Learnaholic: you could always draw your own polygons using the [PATCH](http://www.mathworks.com/help/matlab/ref/patch.html) function, but I'll leave that to you :) Also why not look at how [SOM Toolbox](http://www.cis.hut.fi/somtoolbox/) implements the drawing part, it is released under GPL license. – Amro Nov 30 '12 at 18:38
  • @Learnaholic: Skipping the inter-nodes distances, is of course correct but also lowers the representational strength of the diagram. The U-matrix with inter-nodes can reveal more easily the underling structure of the input data (which is its purpose as Amro correctly pointed out). I did the diagrams with MS Visio ;) but I heavily use the Som Toolbox mentioned by Amro. If you are working on SOM with Matlab, it is definitely a must. – pater Dec 01 '12 at 09:49
  • @pater Thank you very much, I have learned a lot from you. :-) – Spacey Dec 01 '12 at 17:08
  • @pater Thanks again. I would appreciate any insights you might have on this matter [here](http://stackoverflow.com/questions/13687256/is-it-right-to-normalize-data-and-or-weight-vectors-in-a-som), since you seem to be well versed in this. – Spacey Dec 03 '12 at 16:13
3

The original paper cited in the question states:

A naive application of Kohonen's algorithm, although preserving the topology of the input data is not able to show clusters inherent in the input data.

Firstly, that's true, secondly, it is a deep mis-understanding of the SOM, thirdly it is also a mis-understanding of the purpose of calculating the SOM.

Just take the RGB color space as an example: are there 3 colors (RGB), or 6 (RGBCMY), or 8 (+BW), or more? How would you define that independent of the purpose, ie inherent in the data itself?

My recommendation would be not to use maximum likelihood estimators of cluster boundaries at all - not even such primitive ones as the U-Matrix -, because the underlying argument is already flawed. No matter which method you then use to determine the cluster, you would inherit that flaw. More precisely, the determination of cluster boundaries is not interesting at all, and it is loosing information regarding the true intention of building a SOM. So, why do we build SOM's from data? Let us start with some basics:

  1. Any SOM is a representative model of a data space, for it reduces the dimensionality of the latter. For it is a model it can be used as a diagnostic as well as a predictive tool. Yet, both cases are not justified by some universal objectivity. Instead, models are deeply dependent on the purpose and the accepted associated risk for errors.
  2. Let us assume for a moment the U-Matrix (or similar) would be reasonable. So we determine some clusters on the map. It is not only an issue how to justify the criterion for it (outside of the purpose itself), it is also problematic because any further calculation destroys some information (it is a model about a model).
  3. The only interesting thing on a SOM is the accuracy itself viz the classification error, not some estimation of it. Thus, the estimation of the model in terms of validation and robustness is the only thing that is interesting.
  4. Any prediction has a purpose and the acceptance of the prediction is a function of the accuracy, which in turn can be expressed by the classification error. Note that the classification error can be determined for 2-class models as well as for multi-class models. If you don't have a purpose, you should not do anything with your data.
  5. Inversely, the concept of "number of clusters" is completely dependent on the criterion "allowed divergence within clusters", so it is masking the most important thing of the structure of the data. It is also dependent on the risk and the risk structure (in terms of type I/II errors) you are willing to take.
  6. So, how could we determine the number classes on a SOM? If there is no exterior apriori reasoning available, the only feasible way would be an a-posteriori check of the goodness-of-fit. On a given SOM, impose different numbers of classes and measure the deviations in terms of mis-classification cost, then choose (subjectively) the most pleasing one (using some fancy heuristics, like Occam's razor)

Taken together, the U-matrix is pretending objectivity where no objectivity can be. It is a serious misunderstanding of modeling altogether. IMHO it is one of the greatest advantages of the SOM that all the parameters implied by it are accessible and open for being parameterized. Approaches like the U-matrix destroy just that, by disregarding this transparency and closing it again with opaque statistical reasoning.

monnoo
  • 301
  • 3
  • 3
  • 1
    Hello monnoo, thanks for your input, however I am not entirely following what you are saying. I have used U-matrix to cluster via SOM successfully already. Perhaps I do not understand what you mean. Thanks. – Spacey Feb 12 '14 at 16:30
  • 1
    @monnoo,@Learnaholic, May I suggest you illustrate - probably use code, figures - your explanation with example(s). This will make it clearer. – Gathide May 16 '16 at 10:33