4

I have data of 71 attributes and 17 instances. I want to classify them into six groups or classes. I tried with newsom( data, [ 6 6 ] ).

The result are shown in below figures. I can not figure out where the clusters are located and how I can find them programmatically?


enter image description here enter image description here

I read all those papers regarding the SOM, but could never figure out how to get the clusters and data in those clusters? So, please also indicate that when you reply my question.

user1900559
  • 71
  • 1
  • 3
  • This is not directly helpful, but is there a reason for choosing self-organising maps? I have always found it inelegant, time-consuming, and in the few instances where I've seen it applied (by people that I work with) to produce mostly useless clusterings. – micans Dec 13 '12 at 13:50
  • Maybe you are correct, still is not it a way for clustering? If not, please suggest some other method(s). – user1900559 Dec 13 '12 at 15:35
  • 17 instances is not a lot. I would suggest using hierarchical clustering using e.g. both single linkage clustering and complete linkage clustering. This will give you a handle on your data, and you could use the resulting tree to separate your data into six classes. You could also use k-means with k set to 6. For all approaches you need to make sure that none of the attributes dominates the others (unless that is what you want); normalisation may be needed. Finally, for some types of data (such as time course), clustering based on e.g. Pearson correlation coefficient may be appropriate. – micans Dec 13 '12 at 16:44

2 Answers2

1

You have to study carefully the documentation about the return structure from newsom (that is currently deprecated) or selforgmap functions. Inside IW field you can find the N*N cluster coordinates. For example:

somnet = newsom( data, [ 6 6 ] )
my_clusters = somnet.IW;

myclusters will have N*N rows (in your case 6*6) and M columns equals to input dimensions. That's all.

Matteo De Felice
  • 1,488
  • 9
  • 23
  • 1
    Thank you very much. The IW gives the weights of the nodes for each of the data. For my data, there are 36-by-71 data in the my_clusters. So, still these data do not show the clusters. – user1900559 Dec 13 '12 at 15:46
  • In this case, the weights ARE the cluster coordinates. So, each of the 36 rows you have is the 71-dimensional vector representing the center of the cluster. – Matteo De Felice Dec 17 '12 at 10:06
1

Since you have a high ratio of instances to map nodes and as a result you have nodes that in the final map do not "win" an instance, you could separate the map using these "empty" nodes. Check the following for more on clustering on the som here: Clustering of the Self−Organizing Map

Keep in mind that SOM is an unsuperivised clustering method, namely you don't define the number of clusters, the data will tell you about this.

pater
  • 1,211
  • 9
  • 16
  • Thank you for your reply. I know that SOM is a unsupervised clustering method. Problem is that I need either 4 or 6 cluster to be present after the grouping is complete. Unfortunately, the kmeans algorithms does not return satisfactory results. – user1900559 Dec 13 '12 at 15:37
  • @user1900559, If you need a small number of clusters you could use a smaller map (2x3) and consider each map node a cluster, but this not recommended use of SOM. You could also implement a hierarchical clustering on the SOM as described in the paper I have referenced and there you could define you desired number of final clusters. – pater Dec 14 '12 at 12:10
  • Thanks. I tried with hierarchical clustering and k-means. Problem is that some data-points shift from one cluster to another with different clustering processes. – user1900559 Dec 28 '12 at 15:23
  • @user1900559 , This is a common issue in clustering proccess. Different algorithms produce different results. There are clustering quality measures (google it and you 'll find plenty of info) to evaluate your results and choose accordingly. – pater Dec 31 '12 at 12:20