-1

I have implemented the Self-Organizing Map(SOM) algorithm in MATLAB. Suppose each of the data points are represented in 2-dimensional space. The problem is that I want to visualize the movement of each of the data points in the training phase i.e. I want to see how the points are moving and eventually forming clusters as the algorithm is in progress say at every fix duration. I believe that this can be done through Simulation in MATLAB,but I don't know how to incorporate my MATLAB code for visualization?

Sujeet
  • 11
  • 3
  • This is a quite broad question, but let's start first with the first. Are you using som to clusterize 2D space, or this is just to test how your algorithm is working? You can make a scatter plot of your data, and them move the som clusters centers for each iteration (using XDataSource, YDataSource, take a look [here](http://www.mathworks.com/help/matlab/data_analysis/making-graphs-responsive-with-data-linking.html)). So, all you need to do is: before your iterations, scatter plot the data, plot starting centers linked to the their position, and then iterate, and refresh to move the centers. – Werner Sep 11 '13 at 21:29
  • When I said `refresh`, read it as `refreshdata`. Here is another [example](http://www.mathworks.com/help/matlab/creating_plots/linking-graphs-to-variables--data-source-properties.html). Since this is your first post here, a word of advice, if you want to talk directly to someone, use @ and the name of the person, without space. – Werner Sep 11 '13 at 21:34
  • @ Werner : Thank you a lot for your reply. I am discussing the problem in detail now. Each of the data points, let us suppose, are of 5 dimension and I have 100 such data points. In SOM I am representing the data points by 2D points. Now I want to see for each iteration how those points are moving and finally forming clusters.Please help, any kind of suggestion will be of great help to me. – Sujeet Sep 12 '13 at 06:08
  • @ werner : I would like to add one more thing here. Suppose each iteration runs for 100 seconds, then I want to see the current position of the data points at every 5 seconds, for an example. So in that case I will have 20 snap shots of the relative position of the points. – Sujeet Sep 12 '13 at 06:19
  • You can't visualize 5d space, that's why people use SOM or another techniques to reduce dimensions. If you want to see how SOM is moving around, you have 5 taken by 2 combinations of dimensions that you can plot, which gives you 20 axis. You can subplot 20 axis in one figure and squeeze them to occupy as much space as possible, and link the X and Y axis so that when you zoom one of them, the other will also move. I can give you an example of how to do that if you think this is reasonable. – Werner Sep 12 '13 at 16:08
  • Now for your second commentary, If each iteration is that heavy, plotting and updating the graphics on the fly won't make sense. Instead, you could store center positions and them loop over the stored centers pausing for 1 s to simulate the process. – Werner Sep 12 '13 at 16:11
  • @Werner: Yes, it seems reasonable to me, will you please elaborate the concept of the axes with an example? – Sujeet Sep 12 '13 at 18:24

1 Answers1

2

I developed a code example to visualize clustering data with multiple dimensions using all possible data projection in 2-D. It may not be the best idea for visualization (there are techniques developed for this, as SOM itself may be used for this need), specially for a higher dimension numbers, but when the number of possible projections (n-1)! is not that high it is a quite good visualizer.


Cluster Algorithm 

Since I needed access to the code so that I could save the cluster means and cluster labels for each iteration, I used a fast kmeans algorithm available at FEX by Mo Chen, but I had to adapt it so I could have this access. The adapted code is the following:

function [label,m] = litekmeans(X, k)
% Perform k-means clustering.
%   X: d x n data matrix
%   k: number of seeds
% Written by Michael Chen (sth4nth@gmail.com).
n = size(X,2);
last = 0;
iter = 1;
label{iter} = ceil(k*rand(1,n));  % random initialization
checkLabel = label{iter};
m = {};
while any(checkLabel ~= last)
    [u,~,checkLabel] = unique(checkLabel);   % remove empty clusters
    k = length(u);
    E = sparse(1:n,checkLabel,1,n,k,n);  % transform label into indicator matrix
    curM = X*(E*spdiags(1./sum(E,1)',0,k,k));    % compute m of each cluster
    m{iter} = curM;
    last = checkLabel';
    [~,checkLabel] = max(bsxfun(@minus,curM'*X,dot(curM,curM,1)'/2),[],1); % assign samples to the nearest centers
    iter = iter + 1;
    label{iter} = checkLabel;
end
% Get last clusters centers
m{iter} = curM;
% If to remove empty clusters:
%for k=1:iter
%  [~,~,label{k}] = unique(label{k});
%end

Gif Creation

I also used @Amro's Matlab video tutorial for the gif creation.

Distinguishable Colors

I used this great FEX by Tim Holy for making the cluster colors easier to distinguish.

Resulting code

My resulting code is as follows. I had some issues because the number of clusters would change for each iteration which would cause scatter plot update to delete all cluster centers without giving any errors. Since I didn't noticed that, I was trying to workaround the scatter function with any obscure method that I could find the web (btw, I found a really nice scatter plot alternative here), but fortunately I got what was happening going back to this today. Here is the code I did for it, you may feel free to use it, adapt it, but please keep my reference if you use it.

function varargout=kmeans_test(data,nClusters,plotOpts,dimLabels,...
  bigXDim,bigYDim,gifName)
%
% [label,m,figH,handles]=kmeans_test(data,nClusters,plotOpts,...
%   dimLabels,bigXDim,bigYDim,gifName)
% Demonstrate kmeans algorithm iterative progress. Inputs are:
%
% -> data (rand(5,100)): the data to use.
%
% -> nClusters (7): number of clusters to use.
%
% -> plotOpts: struct holding the following fields:
%
%   o leftBase: the percentage distance from the left
%
%   o rightBase: the percentage distance from the right
%
%   o bottomBase: the percentage distance from the bottom
%
%   o topBase: the percentage distance from the top
%
%   o FontSize: FontSize for axes labels.
%
%   o widthUsableArea: Total width occupied by axes
%
%   o heigthUsableArea: Total heigth occupied by axes
%
% -> bigXDim (1): the big subplot x dimension
%
% -> bigYDim (2): the big subplot y dimension
%
% -> dimLabels: If you want to specify dimensions labels
%
% -> gifName: gif file name to save
%
% Outputs are:
% 
% -> label: Sample cluster center number for each iteration
%
% -> m: cluster center mean for each iteration
%
% -> figH: figure handle
%
% -> handles: axes handles
%

%
% - Creation Date: Fri, 13 Sep 2013 
% - Last Modified: Mon, 16 Sep 2013 
% - Author(s): 
%   - W.S.Freund <wsfreund_at_gmail_dot_com> 

%
% TODO List (?):
%
%  - Use input parser 
%  - Adapt it to be able to cluster any algorithm function.
%  - Use arrows indicating cluster centers movement before moving them.
%  - Drag and drop small axes to big axes.
%

% Pre-start
if nargin < 7
  gifName = 'kmeansClusterization.gif';
  if nargin < 6
    bigYDim = 2;
    if nargin < 5
      bigXDim = 1;
      if nargin < 4
        nDim = size(data,1);
        maxDigits = numel(num2str(nDim));
        dimLabels = mat2cell(sprintf(['Dim %0' num2str(maxDigits) 'd'],...
          1:nDim),1,zeros(1,nDim)+4+maxDigits);
        if nargin < 3
          plotOpts = struct('leftBase',.05,'rightBase',.02,...
            'bottomBase',.05,'topBase',.02,'FontSize',10,...
            'widthUsableArea',.87,'heigthUsableArea',.87);
          if nargin < 2
            nClusters = 7;
            if nargin < 1
              center1 = [1; 0; 0; 0; 0];
              center2 = [0; 1; 0; 0; 0];
              center3 = [0; 0; 1; 0; 0];
              center4 = [0; 0; 0; 1; 0];
              center5 = [0; 0; 0; 0; 1];
              center6 = [0; 0; 0; 0; 1.5];
              center7 = [0; 0; 0; 1.5; 1];
              data = [...
                      bsxfun(@plus,center1,.5*rand(5,20)) ...
                      bsxfun(@plus,center2,.5*rand(5,20)) ...
                      bsxfun(@plus,center3,.5*rand(5,20)) ...
                      bsxfun(@plus,center4,.5*rand(5,20)) ...
                      bsxfun(@plus,center5,.5*rand(5,20)) ...
                      bsxfun(@plus,center6,.2*rand(5,20)) ...
                      bsxfun(@plus,center7,.2*rand(5,20)) ...
                     ];
            end
          end
        end
      end
    end
  end
end

% NOTE of advice: It seems that Matlab does not test while on
% refreshdata if the dimension of the inputs are equivalent for the
% XData, YData and CData while using scatter. Because of this I wasted
% a lot of time trying to debug what was the problem, trying many
% workaround since my cluster centers would disappear for no reason.

% Draw axes:
nDim = size(data,1);

figH = figure;
set(figH,'Units', 'normalized', 'Position',...
  [0, 0, 1, 1],'Color','w','Name',...
  'k-means example','NumberTitle','Off',...
  'MenuBar','none','Toolbar','figure',...
  'Renderer','zbuffer');

% Create dintinguishable colors matrix:
colorMatrix = distinguishable_colors(nClusters);

% Create axes, deploy them on handles matrix more or less how they
% will be positioned:
[handles,horSpace,vertSpace] = ...
  createAxesGrid(5,5,plotOpts,dimLabels);

% Add main axes
bigSubSize = ceil(nDim/2);
bigSubVec(bigSubSize^2) = 0;
for k = 0:nDim-bigSubSize
  bigSubVec(k*bigSubSize+1:(k+1)*bigSubSize) = ...
    ... %(nDim-bigSubSize+k)*nDim+1:(nDim-bigSubSize+k)*nDim+(nDim-bigSubSize+1);
    bigSubSize+nDim*k:nDim*(k+1);
end

handles(bigSubSize,bigSubSize) = subplot(nDim,nDim,bigSubVec,...
  'FontSize',plotOpts.FontSize,'box','on'); 
bigSubplotH = handles(bigSubSize,bigSubSize);
horSpace(bigSubSize,bigSubSize) = bigSubSize;
vertSpace(bigSubSize,bigSubSize) = bigSubSize;
set(bigSubplotH,'NextPlot','add',...
  'FontSize',plotOpts.FontSize,'box','on',...
  'XAxisLocation','top','YAxisLocation','right');

% Squeeze axes through space to optimize space usage and improve
% visualization capability:
[leftPos,botPos,subplotWidth,subplotHeight]=setCustomPlotArea(...
  handles,plotOpts,horSpace,vertSpace);

pColorAxes = axes('Position',[leftPos(end) botPos(end) ...
  subplotWidth subplotHeight],'Parent',figH);
pcolor([1:nClusters+1;1:nClusters+1])
% image(reshape(colorMatrix,[1 size(colorMatrix)])); % Used image to
% check if the upcoming buggy behaviour would be fixed. I was not
% lucky, though...
colormap(pColorAxes,colorMatrix);
% Change XTick positions to its center:
set(pColorAxes,'XTick',.5:1:nClusters+.5);
set(pColorAxes,'YTick',[]);
% Change its label to cluster number:
set(pColorAxes,'XTickLabel',[nClusters 1:nClusters-1]); % FIXME At
% least on my matlab I have to use this buggy way to set XTickLabel.
% Am I doing something wrong? Since I dunno why this is caused, I just
% change the code so that it looks the way it should look, but this is
% quite strange...
xlabel(pColorAxes,'Clusters Colors','FontSize',plotOpts.FontSize);

% Now iterate throw data and get cluster information:
[label,m]=litekmeans(data,nClusters);

nIters = numel(m)-1;

scatterColors = colorMatrix(label{1},:);

annH = annotation('textbox',[leftPos(1),botPos(1) subplotWidth ...
  subplotHeight],'String',sprintf('Start Conditions'),'EdgeColor',...
  'none','FontSize',18);

% Creates dimData_%d variables for first iteration:
for curDim=1:nDim
  curDimVarName = genvarname(sprintf('dimData_%d',curDim));
  eval([curDimVarName,'= m{1}(curDim,:);']);
end

%   clusterColors will hold the colors for the total number of clusters
% on each iteration:
clusterColors = colorMatrix;

% Draw cluster information for first iteration:
for curColumn=1:nDim
  for curLine=curColumn+1:nDim
    % Big subplot data:
    if curColumn == bigXDim && curLine == bigYDim
      curAxes = handles(bigSubSize,bigSubSize);
      curScatter = scatter(curAxes,data(curColumn,:),...
        data(curLine,:),16,'filled');
      set(curScatter,'CDataSource','scatterColors');
      % Draw cluster centers 
      curColumnVarName = genvarname(sprintf('dimData_%d',curColumn));
      curLineVarName = genvarname(sprintf('dimData_%d',curLine));
      eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ... 
        curLineVarName ',100,colorMatrix,''^'',''filled'');']);
      set(curScatter,'XDataSource',curColumnVarName,'YDataSource',...
        curLineVarName,'CDataSource','clusterColors')
    end
    % Small subplots data:
    curAxes = handles(curLine,curColumn);
    % Draw data:
    curScatter = scatter(curAxes,data(curColumn,:),...
      data(curLine,:),16,'filled');
    set(curScatter,'CDataSource','scatterColors');
    % Draw cluster centers 
    curColumnVarName = genvarname(sprintf('dimData_%d',curColumn));
    curLineVarName = genvarname(sprintf('dimData_%d',curLine));
    eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ... 
      curLineVarName ',100,colorMatrix,''^'',''filled'');']);
    set(curScatter,'XDataSource',curColumnVarName,'YDataSource',...
      curLineVarName,'CDataSource','clusterColors');
    if curLine==nDim
      xlabel(curAxes,dimLabels{curColumn});
      set(curAxes,'XTick',xlim(curAxes));
    end
    if curColumn==1
      ylabel(curAxes,dimLabels{curLine});
      set(curAxes,'YTick',ylim(curAxes));
    end 
  end
end

refreshdata(figH,'caller');

% Preallocate gif frame. From Amro's tutorial here:
% https://stackoverflow.com/a/11054155/1162884
f = getframe(figH);
[f,map] = rgb2ind(f.cdata, 256, 'nodither');
mov = repmat(f, [1 1 1 nIters+4]);

% Add one frame at start conditions:
curFrame = 1;
% Add three frames without movement at start conditions
f = getframe(figH);
mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');

for curIter = 1:nIters
  curFrame = curFrame+1;
  % Change label text
  set(annH,'String',sprintf('Iteration %d',curIter));
  % Update cluster point colors
  scatterColors = colorMatrix(label{curIter+1},:);
  % Update cluster centers:
  for curDim=1:nDim
    curDimVarName = genvarname(sprintf('dimData_%d',curDim));
    eval([curDimVarName,'= m{curIter+1}(curDim,:);']);
  end
  % Update cluster colors:
  nClusterIter = size(m{curIter+1},2);
  clusterColors = colorMatrix(1:nClusterIter,:);
  % Update graphics:
  refreshdata(figH,'caller');
  % Update cluster colors:
  if nClusterIter~=size(m{curIter},2) % If number of cluster
    % of current iteration differs from previous iteration (or start
    % conditions in case we are at first iteration) we redraw colors: 
    pcolor([1:nClusterIter+1;1:nClusterIter+1])
    % image(reshape(colorMatrix,[1 size(colorMatrix)])); % Used image to
    % check if the upcomming buggy behaviour would be fixed. I was not
    % lucky, though...
    colormap(pColorAxes,clusterColors);
    % Change XTick positions to its center:
    set(pColorAxes,'XTick',.5:1:nClusterIter+.5);
    set(pColorAxes,'YTick',[]);
    % Change its label to cluster number:
    set(pColorAxes,'XTickLabel',[nClusterIter 1:nClusterIter-1]); 
    xlabel(pColorAxes,'Clusters Colors','FontSize',plotOpts.FontSize);
  end
  f = getframe(figH);
  mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');
end

set(annH,'String','Convergence Conditions');

for curFrame = nIters+1:nIters+3
  % Add three frames without movement at start conditions
  f = getframe(figH);
  mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');
end

imwrite(mov, map, gifName, 'DelayTime',.5, 'LoopCount',inf)

varargout = cell(1,nargout);

if nargout > 0
  varargout{1} = label;
  if nargout > 1
    varargout{2} = m;
    if nargout > 2
      varargout{3} = figH;
      if nargout > 3
        varargout{4} = handles;
      end
    end
  end
end

end


function [leftPos,botPos,subplotWidth,subplotHeight] = ...
  setCustomPlotArea(handles,plotOpts,horSpace,vertSpace)
%
% -> handles: axes handles
%
% -> plotOpts: struct holding the following fields:
%
%   o leftBase: the percentage distance from the left
%
%   o rightBase: the percentage distance from the right
%
%   o bottomBase: the percentage distance from the bottom
%
%   o topBase: the percentage distance from the top
%
%   o widthUsableArea: Total width occupied by axes
%
%   o heigthUsableArea: Total heigth occupied by axes
%
% -> horSpace: the axes units size (integers only) that current axes
% should occupy in the horizontal (considering that other occupied
% axes handles are empty)
%
% -> vertSpace: the axes units size (integers only) that current axes
% should occupy in the vertical (considering that other occupied
% axes handles are empty)
%

nHorSubPlot =  size(handles,1);
nVertSubPlot = size(handles,2);

if nargin < 4
  horSpace(nHorSubPlot,nVertSubPlot) = 0;
  horSpace = horSpace+1;
  if nargin < 3
    vertSpace(nHorSubPlot,nVertSubPlot) = 0;
    vertSpace = vertSpace+1;
  end
end

subplotWidth = plotOpts.widthUsableArea/nHorSubPlot;
subplotHeight = plotOpts.heigthUsableArea/nVertSubPlot;

totalWidth = (1-plotOpts.rightBase) - plotOpts.leftBase;
totalHeight = (1-plotOpts.topBase) - plotOpts.bottomBase;

gapHeigthSpace = (totalHeight - ...
  plotOpts.heigthUsableArea)/(nVertSubPlot);
gapWidthSpace = (totalWidth - ...
  plotOpts.widthUsableArea)/(nHorSubPlot);

botPos(nVertSubPlot) = plotOpts.bottomBase + gapWidthSpace/2;
leftPos(1) = plotOpts.leftBase + gapHeigthSpace/2;

botPos(nVertSubPlot-1:-1:1) = botPos(nVertSubPlot) + (subplotHeight +...
  gapHeigthSpace)*(1:nVertSubPlot-1);
leftPos(2:nHorSubPlot) = leftPos(1) + (subplotWidth +...
  gapWidthSpace)*(1:nHorSubPlot-1);

for curLine=1:nHorSubPlot
  for curColumn=1:nVertSubPlot
    if handles(curLine,curColumn)
      set(handles(curLine,curColumn),'Position',[leftPos(curColumn)...
        botPos(curLine) horSpace(curLine,curColumn)*subplotWidth ...             
        vertSpace(curLine,curColumn)*subplotHeight]);                     
    end
  end                                                         
end                                                           

end


function [handles,horSpace,vertSpace] = ...
  createAxesGrid(nLines,nColumns,plotOpts,dimLabels)

handles = zeros(nLines,nColumns);

% Those hold the axes size units:
horSpace(nLines,nColumns) = 0;
vertSpace(nLines,nColumns) = 0;

for curColumn=1:nColumns
  for curLine=curColumn+1:nLines
    handles(curLine,curColumn) = subplot(nLines,...
      nColumns,curColumn+(curLine-1)*nColumns);
    horSpace(curLine,curColumn) = 1;
    vertSpace(curLine,curColumn) = 1;
    curAxes = handles(curLine,curColumn);
    if feature('UseHG2')
      colormap(handle(curAxes),colorMatrix);
    end
    set(curAxes,'NextPlot','add',...
      'FontSize',plotOpts.FontSize,'box','on'); 
    if curLine==nLines
      xlabel(curAxes,dimLabels{curColumn});
    else
      set(curAxes,'XTick',[]);
    end
    if curColumn==1
      ylabel(curAxes,dimLabels{curLine});
    else
      set(curAxes,'YTick',[]);
    end
  end
end
end

Example

Here is an example using 5 dimensions, using the code:

center1 = [1; 0; 0; 0; 0];
center2 = [0; 1; 0; 0; 0];
center3 = [0; 0; 1; 0; 0];
center4 = [0; 0; 0; 1; 0];
center5 = [0; 0; 0; 0; 1];
center6 = [0; 0; 0; 0; 1.5];
center7 = [0; 0; 0; 1.5; 1];
data = [...
        bsxfun(@plus,center1,.5*rand(5,20)) ...
        bsxfun(@plus,center2,.5*rand(5,20)) ...
        bsxfun(@plus,center3,.5*rand(5,20)) ...
        bsxfun(@plus,center4,.5*rand(5,20)) ...
        bsxfun(@plus,center5,.5*rand(5,20)) ...
        bsxfun(@plus,center6,.2*rand(5,20)) ...
        bsxfun(@plus,center7,.2*rand(5,20)) ...
       ];
[label,m,figH,handles]=kmeans_test(data,20);

enter image description here

Community
  • 1
  • 1
Werner
  • 2,537
  • 1
  • 26
  • 38