In the interest of completeness there's a vastly simpler answer to this problem (which I have built) than Hierarchical clustering; which gives much better results and can differentiate between 1 cluster or 2 (an issue that I couldn't manage to fix with MarkV's suggestions). This assumes your data is on a regular grid of known size, and you have an unknown amount of clusters that are separated by at least 2*(grid size):
% Idea is as follows:
% * We have a known grid size, dx.
% * A random point [may as well be minima(1,:)] will be in a cluster of
% values if any others in the list lie dx away (with one dimention
% varied), sqrt(2)*dx (two dimensions varied) or sqrt(3)*dx (three
% dimensions varied).
% * Chain these objects together until all are found, any with distances
% beyond sqrt(3)*dx of the cluster are ignored for now.
% * Set this cluster aside, repeat until no minima data is left.
function [blobs, clusterIdx] = findClusters(minima,dx)
%problem setup
dx2 = sqrt(2)*dx;
dx3 = sqrt(3)*dx;
eqf = @(list,dx,dx2,dx3)(abs(list-dx) < 0.001 | abs(list-dx2) < 0.001 | abs(list-dx3) < 0.001);
notDoneClust = true;
notDoneMinima = true;
clusterIdx = zeros(size(minima,1),1);
point = minima(1,:);
list = minima(2:end,:);
blobs = 0;
while notDoneMinima
cluster = nan(1,3);
while notDoneClust
[~, dist] = knnsearch(point,list); %All distances to each other point in data
nnidx = eqf(dist,dx,dx2,dx3); %finds indexes of nn values to point
cluster = cat(1,cluster,point,list(nnidx,:)); %add points to current cluster
point = list(nnidx,:); %points to check are now all values that are nn to initial point
list = list(~nnidx,:); %list is now all other values that are not in that list
notDoneClust = ~isempty(point); %if there are no more points to check, all values of the cluster have been found
end
blobs = blobs+1;
clusterIdx(ismemberf(minima,cluster(2:end,:),'rows')) = blobs;
%reset points and list for a new cluster
if ~isempty(list)
if length(list)>1
point = list(1,:);
list = list(2:end,:);
notDoneClust = true;
else
%point is a cluster of its own. Don't reset loops, add point in
%as a cluster and exit (NOTE: I have yet to test this portion).
blobs = blobs+1;
clusterIdx(ismemberf(minima,point,'rows')) = blobs;
notDoneMinima = false;
end
else
notDoneMinima = false;
end
end
end
I fully understand this method is useless for clustering data in the general sense, as any outlying data will be marked as a separate cluster. This (if it happens) is what I need anyway, so this may just be an edge case scenario.