4

I'm trying to build implementation code for k-means algorithm by using matlab. I'm learning and new to use matlab here. Somehow I built the implementation code for k-means algorithm by googling watching youtube of matlab functions. I set up the initial 3 initial centroids and have iris datasets, and those three centroids goes to right direction to make 3 clusters when I checked it. However, I don't really understand and can't find the source from web that I want. Can anybody help me out how to draw 2D PCA scatter plots with each different colors of three clusters?

This is my code implementation for k-mean,

clear; clc; close all;
load iris.xls
DataSet = iris;
Dim = size(DataSet);

load Iris_Initial_Centroids.xls
Centroid = Iris_Initial_Centroids;
Dim_Cen = size(Centroid);

Centroid1 = Centroid(1,:);
Centroid2 = Centroid(2,:);
Centroid3 = Centroid(3,:);

n = input('Enter the number of Iteration : ');

for i=1:1:n
    count1 = 0;
    Mean1 = zeros(1,4);
    count2 = 0;
    Mean2 = zeros(1,4);
    count3 = 0;
    Mean3 = zeros(1,4);

    for j=1:1:Dim(1,1)
        Pattern1(j)=sqrt((Centroid1(1,1)-DataSet(j,1))^2+(Centroid1(1,2)-DataSet(j,2))^2+(Centroid1(1,3)-DataSet(j,3))^2+(Centroid1(1,4)-DataSet(j,4))^2);
        Pattern2(j)=sqrt((Centroid2(1,1)-DataSet(j,1))^2+(Centroid2(1,2)-DataSet(j,2))^2+(Centroid2(1,3)-DataSet(j,3))^2+(Centroid1(1,4)-DataSet(j,4))^2);
        Pattern3(j)=sqrt((Centroid3(1,1)-DataSet(j,1))^2+(Centroid3(1,2)-DataSet(j,2))^2+(Centroid3(1,3)-DataSet(j,3))^2+(Centroid1(1,4)-DataSet(j,4))^2);
        closestDistance = [Pattern1(j) Pattern2(j) Pattern3(j)];
        minimum = min(closestDistance);
    if (minimum == Pattern1(j))
        count1 = count1+1;
        Mean1 = Mean1 + DataSet(j,:);
    else if (minimum == Pattern2(j))
            count2 = count2 + 1;
            Mean2 = Mean2 + DataSet(j,:);
        else
            count3 = count3+1;
            Mean3 = Mean3 + DataSet(j,:);
        end
    end
    end

    Centroid1 = Mean1/count1;
    Centroid2 = Mean2/count2;
    Centroid3 = Mean3/count3;
    %plot(i, Centroid1, '.');
    %plot(i, Centroid2, '.');
    %plot(i, Centroid3, '.');
end

**[coeff.score.latent] = pca(DataSet);
newDataSet = score(:,1:2);
plot(newDataSet(:,1),newDataSet(:,2),'.');**

At the end of three lines in the code, it gave me an error to draw scatter in PCA. I'm trying to draw reduced 2D PCA scatter plots for each clusters with different color such as rgb color. What is my problem? and Can anybody help me to figure this out for me? This might be big help to understand and learn matlab for me.

Thanks..

1 Answers1

0

The error is here:

[coeff.score.latent] = pca(DataSet);

You are using dots to separate out the parameters. Those should be commas.

[coeff,score,latent] = pca(DataSet);
rayryeng
  • 102,964
  • 22
  • 184
  • 193
  • Thank you for replying me back and helping me out! :) I have one more question, if I want to plot as 3 different colors of 3 clusters such as RGB, then how do I have to write the code? –  Nov 01 '15 at 06:42
  • @Bow_Wow Your current code only calculates the centroids. It doesn't compute cluster memberships per point. Do you have code somewhere that does this, or did you want this to be done? – rayryeng Nov 01 '15 at 06:54
  • um.. I just wanted to be done and wanted to know how it works for each cluster shows as rgb.. uh oh.. I didn't know I have to compute all cluster memberships per points.. oh my.. I don't have the code that does computing cluster memberships per points.. –  Nov 01 '15 at 07:03
  • Well, you computed the centroids so that's a start. Did you just want to plot the centroids in different colours?.... each point that you have in your dataset, it has to belong to a cluster. The point that is the closest to a particular centroid, that's where it belongs to. I'm not sure what you actually want them given your current code and your question. Can you clarify? – rayryeng Nov 01 '15 at 07:12
  • Uhm... I didn't want to just plot only the computed centroids.. I wanted to plot the all the cluster memberships and computed centroids in different color. For example, if there are 3 clusters, then plot should have 3 colors of clusters. hm.. I think I have to work on it more.. –  Nov 01 '15 at 07:21
  • @Bow_Wow yeah your current code doesn't compute the cluster memberships. It just computes the centroids. Getting cluster memberships is very simple. For each point, find the distance between all centroids and whichever one gave you the smallest distance, that's the cluster it belongs to. If you don't want to write the code for that, then may I suggest using `knnsearch` from the Stats toolbox? It'll do what you need if you don't want to write it from scratch. I also have something that you can use here: http://stackoverflow.com/questions/27475978/finding-k-nearest-neighbours-with-matlab – rayryeng Nov 01 '15 at 07:26
  • If I have question, then can I just put question here? –  Nov 01 '15 at 07:31
  • Hi, I actually got the values of cluster members and got the plot of PCA on my own. I just used the vertcat function to get cluster members between if statement. such as cluster1 = vertcat(cluster1,DataSet(j,:)); and for plot, newDataSet = vertcat(cluster1,cluster2,cluster3); [coeff, score, latent] = pca(newDataSet); newcluster1 = score(:, 1:2); plot(newcluster1(1:50,1), newcluster1(1:50,2),'r.'); Something like this. :) Anyway once again thanks a lot for helping me out. –  Nov 02 '15 at 05:37