6

I have a question about Groupby of C#.

I made a List like shown below:

List<double> testList = new List<double>();

testList.Add(1);    
testList.Add(2.1);  
testList.Add(2.0);  
testList.Add(3.0);  
testList.Add(3.1);  
testList.Add(3.2);  
testList.Add(4.2);  

I'd like to group these number list like this:

group 1 => 1  
group 2 => 2.1 , 2.0  
group 3 => 3.0 , 3.1 , 3.2  
group 4 => 4.2

so, I wrote code like this:

var testListGroup = testList.GroupBy(ele => ele, new DoubleEqualityComparer(0.5));

DoubleEqualityComparer definition is like this:

public class DoubleEqualityComparer : IEqualityComparer<double>
{
    private double tol = 0;

    public DoubleEqualityComparer(double Tol)
    {
        tol = Tol;
    }

    public bool Equals(double d1,double d2)
    {
        return EQ(d1,d2, tol);
    }

    public int GetHashCode(double d)
    {
        return d.GetHashCode();
    }
    public bool EQ(double dbl, double compareDbl, double tolerance)
    {
        return Math.Abs(dbl - compareDbl) < tolerance;
    }
}

Yet the GroupBy clause doesn't work like the this:

group 1 => 1  
group 2 => 2.1
group 3 => 2.0  
group 4 => 3.0
group 5 => 3.1
group 6 => 3.2
group 7 => 4.2

I don't know what the problem is. Please let me know if there is problem, and solutions.

kdm
  • 101
  • 3

6 Answers6

3

use simple Math.Floor to get lower range of the number so that 5.8 should not be treated as 6.

List<double> testList = new List<double>();

testList.Add(1);
testList.Add(2.1);
testList.Add(2.0);
testList.Add(3.0);
testList.Add(3.1);
testList.Add(3.2);
testList.Add(4.2);
testList.Add(5.8);
testList.Add(5.5);

var testListGroup = testList.GroupBy(s => Math.Floor(s)).ToList();
A.T.
  • 24,694
  • 8
  • 47
  • 65
2

Your GetHashCode method should return the same value for numbers, that should be "equal".

EqualityComparer works in two steps:

  1. Checked GetHashCode if the value with this hash code was not processed yet, then this value gets into new single group

  2. If value with this hash code was obtained - then checkinq result of Equals method. If it is true - adding current element to existing group, else adding it to new group.

In your case every double returns the different hash codes, so method Equals does not called.

So, if you do not care about processing time, you can simple return constant value in GetHashCode method as @FirstCall suggested. And if you care about it, I recommend to modify your method as follows:

public int GetHashCode(double d)
{
    return Math.Round(d).GetHashCode();
}

Math.Round should correctly work for tolerance = 0.5, for another tolerance values you should improve this.

I recommend you to read this blog post to get familiar with IEqualityComparer and Linq.

The simplest way with less amount of code is always return the constant value from the GetHashCode - it will work for any tolerance value, but, as I wrote, it is quite inefficient solution on large amounts of data.

Mikhail Tulubaev
  • 4,141
  • 19
  • 31
1

You can group by using below code sample,

var list = testList.GroupBy(s => Convert.ToInt32(s) ).Select(group => new { Key = group.Key, Elements = group.ToList() });

//OutPut
//group 1 => 1  
//group 2 => 2.1 , 2  
//group 3 => 3 , 3.1 , 3.2  
//group 4 => 4.2

Explanation of the code, When we apply GroupBy for a list which have only a single data column,It groups by looking same content. For an example think you have string list (foo1, foo2, foo3, foo1, foo1, foo2). So then it groups into three separate group leading by foo1, foo2 and foo3.

But in this scenario you can't find any same content(1.0,2.1,2.2,2.3,3.1,3.2...)So what we should do is bring them as a same content. When we convert them to int then it gives (1,2,2,2,3,3...). Then we can easily group it.

SilentCoder
  • 1,970
  • 1
  • 16
  • 21
  • 1
    Select is not even required. – Stefan Steinegger Sep 07 '16 at 04:47
  • Yes of course. @StefanSteinegger. Thanks for the suggestion. – SilentCoder Sep 07 '16 at 04:48
  • 3
    It will give false positive results, if OP will want to run it on another set of numbers: (1.0, 1.5, 2.5) => ([1.0], [1.5,2.5]) as `Convert.ToInt32` returns the nearest even number if the value on halfway between two whole numbers. 1.5 => 2.0, 2.5 => 2.0. – Mikhail Tulubaev Sep 07 '16 at 05:06
  • @A.T. basically to group we need to have same content like( foo1, foo2, foo3, foo1, foo2, foo2). Then we can do it easily and it creates 3 groups. But in this scenario we only have different values(2.0,2.1,3.1,3.2...). Therefor we should bring it to same group of content. By converting it to int it gives (1,2,2,3,3,3). The we can group. it. – SilentCoder Sep 07 '16 at 05:10
  • @codelahiru nice, it would be better if you add same explanation to the answer. It will help you to gain more votes. – A.T. Sep 07 '16 at 05:20
1

In these types of situations, the debugger is your friend. Put a break point on the Equals method. You will notice that the Equals method of your DoubleEqualityComparer class is not getting hit.

Linq extension methods rely on GetHashCode for equality comparisons. Since the GetHashCode method is not returning equivalent hashes for the doubles in your list, the Equals method is not getting called.

Each GetHashCode method should be atomic in execution and should return the same int value for any two equal comparisons.

This is one working example, though it is not necessarily recommended depending on your usage of this comparer.

public int GetHashCode(double d)
{
     return 1;
}
FirstCall
  • 84
  • 4
  • In the case of using single `IEqualityComparer` for different values of tolerance it is the best way for small amounts of data. – Mikhail Tulubaev Sep 07 '16 at 05:33
  • Your answer is good. I solved this problem with your advice. I'm happy to solve this problem. Thank you very much, – kdm Sep 07 '16 at 08:32
1

Everyone here is discussing what is wrong with your code, but you may actually have a worse problem than that.

If you truly want to group with a tolerance like your title says, rather than group by integer part like these answers assume (and your test data supports), this isn't supported by GroupBy.

GroupBy demands an equivalence relation - your equality comparer must establish that

  • x == x for all x
  • if x == y, y == x for all x and y
  • if x == y and y == z, x == z for all x, y and z

"Within 0.5 of each other" matches the first two points, but not the third. 0 is close so 0.4, and 0.4 is close to 0.8, but 0 is not close to 0.8. Given an input of 0, 0.4 and 0.8, what groups would you expect?

Rawling
  • 49,248
  • 7
  • 89
  • 127
0

Your problem is your implementation of GetHashCode which must return equal values for everything you consider to be equal. So for two different values d1 and d2 that should go to the same group the method should return the same hash-code. To do so round the given number to the nearest intereger and compute its hashcode afterwards:

public int GetHashCode(double d)
{
    return Convert.ToInt32(d).GetHashCode();
}

Right after that hashcode is calculated the actual Equal-check is done. In your current case the hash-values returned by GetHashCode are different, thus Equals won´t be executed at all.

MakePeaceGreatAgain
  • 35,491
  • 6
  • 60
  • 111