4

I guess this would be more maths than C#. I've got an array of float values, where most values belong to one of the few tightly packed ranges. Here's an example (Lower Limit=0,Upper Limit=612):

3.4,5.0,6.1, 
144.0,144.14,145.0,147.0, 
273.77,275.19,279.0, 
399.4,399.91,401.45, 
533.26,537.0,538.9

This is a single array of 16 values, just separated them to show those "groups". What I need to do is to somehow group them, either using Linq, or a manual loop or whatever, so that those close values fall in a single group.

A simple math operation like dividing by 10 (or 100) won't work, because 399 would fall in a different group than 401 (4th group in the above example). Another approach would be to create a histogram of some kind, but I'm looking for something simple here. Any help would be greatly appreciated.

ekad
  • 14,436
  • 26
  • 44
  • 46
dotNET
  • 33,414
  • 24
  • 162
  • 251
  • So what output do you expect from the input example you have given? – sloth Sep 11 '12 at 06:38
  • something like this: 1 3.4 1 5.0 1 6.1 1 144.0 2 144.14 2 145.0 2 147.0 3 273.77 and so on. – dotNET Sep 11 '12 at 06:39
  • Are you asking how to determine what group each number will be in? If so, how many groups will there be? How many items will there be? What are your criteria for clustering? – Gabe Sep 11 '12 at 06:39
  • well, criteria for clustering is somewhat fuzzy, but we can see above that those values are "clearly" packed in some very close ranges. – dotNET Sep 11 '12 at 06:42
  • I can't seem to add "carriage return" in the comments. There is a CR after every float value in the above comment. – dotNET Sep 11 '12 at 06:43

2 Answers2

1

Here's a method that groups elements if they're within a certain delta (4 by default) of the previous value:

IEnumerable<IEnumerable<double>> GetClusters(IEnumerable<double> data,
                                             double delta = 4.0)
{
    var cluster = new List<double>();
    foreach (var item in data.OrderBy(x=>x))
    {
        if (cluster.Count > 0 && item > cluster[cluster.Count - 1] + delta)
        {
            yield return cluster;
            cluster = new List<double>();
        }
        cluster.Add(item);
    }
    if (cluster.Count > 0)
        yield return cluster;
}

You can get tweak the algorithm by changing what you use for cluster[cluster.Count - 1] + delta. For example, you might use

  • cluster[0] + delta - delta from first element in the cluster
  • cluster.Average() + delta - delta from the mean of the cluster so far
  • cluster[cluster.Count / 2] + delta - delta from the median of the cluster so far
Gabe
  • 84,912
  • 12
  • 139
  • 238
  • what would .OrderBy(x=>x) do? – dotNET Sep 11 '12 at 06:57
  • @dotNET: That `OrderBy` sorts the items in ascending order. That way it can always be assured that each item will be greater than or equal to the previous item without you having to pass in a sorted list. – Gabe Sep 11 '12 at 07:00
  • This seems like an acceptable solution. Let me check it on my end and get back. I'll mark it as answer if we don't get a better answer by then. – dotNET Sep 11 '12 at 07:05
1

Just an another idea of a clustering with using GroupBy with a custom comparer

var numbers = new float[]
    {
       3.4f, 5.0f, 6.1f, 144.0f, 144.14f, 145.0f, 
       147.0f, 273.77f, 275.19f, 279.0f, 399.4f, 399.91f, 401.45f,
       49, 50, 51,
       533.26f, 537.0f, 538.9f
    };

foreach (var group in numbers.GroupBy(i => i, new ClosenessComparer(4f)))
    Console.WriteLine(string.Join(", ", group));

And the custom ClosenessComparer:

public class ClosenessComparer : IEqualityComparer<float>
{
    private readonly float delta;

    public ClosenessComparer(float delta)
    {
        this.delta = delta;
    }

    public bool Equals(float x, float y)
    {
        return Math.Abs((x + y)/ 2f - y) < delta;
    }

    public int GetHashCode(float obj)
    {
        return 0;
    }
}

And the output:

1: 3,4 5 6,1
2: 144 144,14 145 147
3: 273,77 275,19 279
4: 399,4 399,91 401,45
6: 49 50 51
5: 533,26 537 538,9
nemesv
  • 138,284
  • 16
  • 416
  • 359