It seems that your question is really about clustering, and that the ID column has nothing to do with the determining which points correspond to which.
A common algorithm to achieve that would be k-means clustering. However, your question implies that you don't know the number of clusters in advance. This complicates matters, and there have been already a lot of questions asked here on StackOverflow regarding this issue:
- Kmeans without knowing the number of clusters?
- compute clustersize automatically for kmeans
- How do I determine k when using k-means clustering?
- How to optimal K in K - Means Algorithm
- K-Means Algorithm
Unfortunately, there is no "right" solution for this. Two clusters in one specific problem could be indeed considered as one cluster in another problem. This is why you'll have to decide that for yourself.
Nevertheless, if you're looking for something simple (and probably inaccurate), you can use Euclidean distance as a measure. Compute the distances between points (e.g. using pdist
), and group points where the distance falls below a certain threshold.
Example
%// Sample input
A = [1, 2.5, 3.5;
1, 85.1, 74.1;
2, 2.6, 3.4;
2, 86.0, 69.8;
3, 25.8, 32.9;
3, 84.4, 68.2;
4, 2.8, 3.2;
4, 24.1, 31.8;
4, 83.2, 67.4];
%// Cluster points
pairs = nchoosek(1:size(A, 1), 2); %// Rows of pairs
d = sqrt(sum((A(pairs(:, 1), :) - A(pairs(:, 2), :)) .^ 2, 2)); %// d = pdist(A)
thr = d < 10; %// Distances below threshold
kk = 1;
idx = 1:size(A, 1);
C = cell(size(idx)); %// Preallocate memory
while any(idx)
x = unique(pairs(pairs(:, 1) == find(idx, 1) & thr, :));
C{kk} = A(x, :);
idx(x) = 0; %// Remove indices from list
kk = kk + 1;
end
C = C(~cellfun(@isempty, C)); %// Remove empty cells
The result is a cell array C
, each cell representing a cluster:
C{1} =
1.0000 2.5000 3.5000
2.0000 2.6000 3.4000
4.0000 2.8000 3.2000
C{2} =
1.0000 85.1000 74.1000
2.0000 86.0000 69.8000
3.0000 84.4000 68.2000
4.0000 83.2000 67.4000
C{3} =
3.0000 25.8000 32.9000
4.0000 24.1000 31.8000
Note that this simple approach has the flaw of restricting the cluster radius to the threshold. However, you wanted a simple solution, so bear in mind that it gets complicated as you add more "clustering logic" to the algorithm.