Similar questions in the database seem to be much more complicated than my example. I want to cluster 100'ish points on a line. Number of groups is irrelevant; the closeness of points is more important.
What is a term, method or algorithm to deal with this grouping problem? K-means, Hamming distance, hierarchical agglomeration, clique or complete linkage??
I've reduced two examples to bare minimum for clarification:
Simple example:
Set A = {600, 610, 620, 630} and the set of differences between its elements is diff_A = {10, 20, 30, 10, 20, 10}. I can then group as follows: {10, 10, 10}, {20, 20}, and {30}. Done.
Problematic example:
Set B = {600, 609, 619, 630} and the set of differences is diff_B = {9, 10, 11, 19, 21, 30}. I try to group with a tolerance of 1, i.e. differences that are 1 (or less) are 'similar enough' to be grouped but I get a paradox: {9, 10} AND/OR {10, 11}, {19}, {21}, and {30}.
Issue:
9 and 10 are close enough, 10 and 11 are close enough, but 9 and 11 are not, so how should I handle these overlapping groups? Perhaps this small example is unsolvable because it is symmetrical?