3

I am trying to implement an IEqualityComparer that has a tolerance on a date comparison. I have also looked into this question. The problem is that I can't use a workaround because I am using the IEqualityComparer in a LINQ .GroupJoin(). I have tried a few implementations that allow for tolerance. I can get the Equals() to work because I have both objects but I can't figure out how to implement GetHashCode().

My best attempt looks something like this:

public class ThingWithDateComparer : IEqualityComparer<IThingWithDate>
{
    private readonly int _daysToAdd;

    public ThingWithDateComparer(int daysToAdd)
    {
        _daysToAdd = daysToAdd;
    }

    public int GetHashCode(IThingWithDate obj)
    {
        unchecked
        {
            var hash = 17;
            hash = hash * 23 + obj.BirthDate.AddDays(_daysToAdd).GetHashCode();
            return hash;
        }
    }

    public bool Equals(IThingWithDate x, IThingWithDate y)
    {
        throw new NotImplementedException();
    }
}

public interface IThingWithDate
{
    DateTime BirthDate { get; set; }
}

With .GroupJoin() building a HashTable out of the GetHashCode() it applies the days to add to both/all objects. This doesn't work.

Community
  • 1
  • 1
Matt Rowland
  • 4,575
  • 4
  • 25
  • 34
  • Is daysToAdd the tolerance, as in Jan 5 equals Jan 6 within a tolerance of 1 day? This definition of equality is not transitive so I doubt it is possible to implement correctly using IEqualityComparer outside of the trivial solution of returning the same hash code for every object. – Mike Zboray Jan 05 '17 at 19:05
  • Forget it. Replace the `GroupJoin` with `SelectMany` and simple `Where` (not very performant, but should work). – Ivan Stoev Jan 05 '17 at 19:05
  • @mikez Yes, that is the tolerance. The naming is just bad. If I cannot make this work, I will just have implement a custom version of `GroupJoin()`. – Matt Rowland Jan 05 '17 at 19:09
  • @IvanStoev The performance is an issue. The current solution for our matching uses where and it provides horrible performance. With the use of `GroupJoin()` I have been able to take a 9 hour process down to milliseconds. – Matt Rowland Jan 05 '17 at 19:10
  • I know (and always do care about .poerformance). But working slow is better than non working. What's the real use case, may be we could think about something else? – Ivan Stoev Jan 05 '17 at 19:21
  • I don't see how a hash-based solution can work here. You have a problem that is more appropriately solved by a data structure like an interval tree. – Mike Zboray Jan 05 '17 at 19:26

2 Answers2

3

The problem is impossible, conceptually. You're trying to compare objects in a way that doesn't have a form of equality that is necessary for the operations you're trying to perform with it. For example, GroupJoin is dependant on the assumption that if A is equal to B, and B is equal to C, then A is equal to C, but in your situation, that's not true. A and B may be "close enough" together for you to want to group them, but A and C may not be.

You're going to need to not implement IEqualityComparer at all, because you cannot fulfill the contract that it requires. If you want to create a mapping of items in one collection to all of the items in another collection that are "close enough" to it then you're going to need to write that algorithm yourself (doing so efficiently is likely to be hard, but doing so inefficiently isn't shouldn't' be that difficult), rather than using GroupJoin, because it's not capable of performing that operation.

Servy
  • 202,030
  • 26
  • 332
  • 449
1

I can't see any way to generate a logical hash code for your given criteria.
The hash code is used to determine if 2 dates should stick together. If they should group together, than they must return the same hash code.

If your "float" is 5 days, that means that 1/1/2000 must generate the same hash code as 1/4/2000, and 1/4/2000 must generate the same hashcode as 1/8/2000 (since they are both within 5 days of each other). That implies that 1/1/2000 has the same code as 1/8/2000 (since if a=b and b=c, a=c).

1/1/2000 and 1/8/2000 are outside the 5 day "float".

Bradley Uffner
  • 16,641
  • 3
  • 39
  • 76
  • Very true. I am thinking I am going to have to scrap the use of `GroupJoin` and implement a version that allows for an `Comparer` that has a seed from the left side. – Matt Rowland Jan 05 '17 at 19:25