How to give label for cluster from GMM iteration?

Question

I read the concept of GMM from Understanding concept of Gaussian Mixture Models. It is helpful for me. I have implemented GMM for fisheriris also but I didn't use fitgmdist function because I didn't have it. So I used code from http://chrisjmccormick.wordpress.com/2014/08/04/gaussian-mixture-models-tutorial-and-matlab-code/.

When I read Understanding concept of Gaussian Mixture Models, Amro could plot the result with its label, i.e. setosa, virginica, and versicolor. How did he do it? After some iterations, I only got mu, Sigma, and weight. There is no label at all. I want to put the label (setosa, virginica, and versicolor) to mixture models from GMM iteration.

Please add the code you've tried so far, so that we can help you to improve it! Welcome to SO! — darthbith, Oct 16 '14 at 13:04
You might want to look at the `gscatter(data(:,1), data(:,2), species, clrDark)` line in the code that you linked. And you'd want to look into the `species` argument. — , Oct 16 '14 at 15:20

score 0 · Answer 1 · answered Oct 18 '14 at 09:08

There are two sets of "labels" in that plot:

one is the "true" labels of the Fisher Iris dataset (the species variable which contains the class of each instance: setoas, versicolor, or virginica). Normally you wouldn't have those in a real dataset (after all the goal of clustering is to discover those groups within the data, which you don't know beforehand). I just used them here to get an idea of how well the EM clustering performed against the actual truth (the scatter points are color-coded according to the class).
the other set of labels are the clusters we found using GMM. Basically I built a 50x50 grid of 2D points to cover the entire data domain, I then assign a cluster to each of those points by computing the posterior probability and choosing the component with highest likelihood. I showed those clusters in the background color. As a nice consequence, we get to see the discriminant decision boundaries between the clusters.

You can see that the cluster of points on the left got separated quite nicely (and perfectly matched the setosa class). While the points on the right side of the plot got separated in two matching the other two classes, although there were instance "misclassified" if you will (some green points on the wrong side of the boundary).

Typically in a real setting you wouldn't have those actual classes to compare against, so no way to tell how "accurate" your clustering was (there exist other metrics for clustering performance evaluation)...

How to give label for cluster from GMM iteration?

1 Answers1