0

Here is the following issue I am trying to solve. Computing the number of distinct elements from the following sequence within a tolerance:

var l = new List<double>() { 0, 0 + 1e-7, 0 + 2e-7, 1 - 1e-7, 1 };

The list is sorted, and all values are distinct (using the default equality comparer to compare values).

How can I count the number of unique values within a particular tolerance value:

static bool IsWithinTolerance(double x, double y)
{
    return Math.Abs(x - y) < 1e-6;
}
malat
  • 12,152
  • 13
  • 89
  • 158
  • 1
    The keyword is clustering ... – derpirscher Sep 24 '21 at 09:11
  • 3
    What if you have 3 elements a, b, c and (a,b) pair is withing tolerance while (b,c) pair is too withing tolerance, but (a, c) is not? – Evk Sep 24 '21 at 09:14
  • @Evk I suspect this does not change the actual number of clusters, finding the proper barycenter for each cluster is a different issue... – malat Sep 24 '21 at 09:18
  • What's the expected output for the sequence `[0, 0 - tolerance, 0 + tolerance]`? Is that 1 cluster, since the last element is within tolerance of `0`, or 2 clusters, since the last element is not within tolerance of `0 - tolerance`? In other words, when testing to see whether an item is a member of the current cluster, do you compare against the closest bound of the cluster, the first element of the cluster, or something else? – canton7 Sep 24 '21 at 09:23
  • @canton7 / evk you were right, I need to compute the centroids first. – malat Sep 24 '21 at 09:27
  • Maybe you just want to round to x decimal places In to a sorted dictionary or hashtable like histogram buckets – TheGeneral Sep 24 '21 at 09:28
  • @canton7 while true - the list you provided is not sorted, and question mentions sorted list. – Evk Sep 24 '21 at 09:32
  • @Evk Good point. `[0 - tolerance, 0, 0 + tolerance]` then -- it has the same question – canton7 Sep 24 '21 at 09:33

1 Answers1

1

You can define a dictionary that stores your results:

var result = new Dictionary<double,List<double>>();

Then you loop over the list, and check if the dictionary contains a key which is in tolerance with your list item. If so, you add the item to the list in the dictionary with this key. If not, you add a new list to the dictionary:

foreach(var i in l)
{
    double? key = result.Keys.Any(x => IsWithinTolerance(x,i)) ? result.Keys.First(x => IsWithinTolerance(x,i)) : default(double?);
    if(key == null)
    {
        result.Add(i, new List<double> { i });
    }
    else
    {
        result[key.Value].Add(i); // requires GetHashCode
    }
}

Warning: this code does not solve the problems that Evk and canton7 noted in their comments to the question. Only use it if your clusters are clearly to be recognized and the members of each cluster are very close to each other.

Online demo: https://dotnetfiddle.net/MkF1QX

malat
  • 12,152
  • 13
  • 89
  • 158
SomeBody
  • 7,515
  • 2
  • 17
  • 33