12

I've implemented a custom IEqualityComparer for EventLogEntry.

public class EventLogEntryListComparison :
    IEqualityComparer<List<EventLogEntry>>,
    IEqualityComparer<EventLogEntry>

For IEqualityComparer<List<EventLogEntry>>, the GetHashCode function is very simple.

public int GetHashCode(List<EventLogEntry> obj)
{
    return obj.Sum(entry => 23 * GetHashCode(entry));
}

However, this throws an OverflowException for certain entries.

"Arithmetic operation resulted in an overflow."
   at System.Linq.Enumerable.Sum(IEnumerable`1 source)
   at System.Linq.Enumerable.Sum[TSource](IEnumerable`1 source, Func`2 selector)
   at <snip>.Diagnostics.EventLogAnalysis.EventLogEntryListComparison.GetHashCode(List`1 obj) in C:\dev\<snip>Diagnostics.EventLogAnalysis\EventLogEntryListComparison.cs:line 112
   at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
   at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
   at <snip>.Diagnostics.EventLogAnalysis.Program.AnalyseMachine(String validMachineName) in C:\dev\<snip>.Diagnostics.EventLogAnalysis\Program.cs:line 104
   at System.Threading.Tasks.Parallel.<>c__DisplayClass2d`2.<ForEachWorker>b__23(Int32 i)
   at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()

After trying to get the same error whilst debugging and couldn't in the immediate window, I changed the code to this and bye bye OverflowException?

int total = 0;
foreach (var eventLogEntry in obj)
{
    total += GetHashCode(eventLogEntry);
}

return total;

How is it that LINQ's Sum function behaving differently?

Edit 2

Thanks to a few comments, the corrected and intended GetHashCode function is now as follows,

public int GetHashCode(List<EventLogEntry> obj)
{
    return unchecked(obj.Aggregate(17,
        (accumulate, entry) =>
        accumulate * 23 + GetHashCode(entry)));
}
M Afifi
  • 4,645
  • 2
  • 28
  • 48
  • 3
    As a side note, there is little benefit in multiplying each element with a prime and then simply adding them. You should multiple the entire hash (calculated so far) on each iteration and then add the next item's hash (I presume you have chosen 23 as your prime based on [this answer](http://stackoverflow.com/questions/2733541), or a similar one). – vgru Jun 14 '12 at 13:21
  • 1
    Note that your edited version is (essentially) multiplying the sub-hashcodes together rather than multiplying by a prime and adding them. Maybe you want `(accumulate*23) + GetHashCode(entry)`? – Rawling Jun 14 '12 at 13:46
  • possible duplicate of [Enumerable.Sum() overflowing](http://stackoverflow.com/questions/2208827/enumerable-sum-overflowing) – nawfal Jun 28 '15 at 11:26
  • Help me out here - I'd like to learn. Could it not be that the sum of all of the `23 * GetHashCode(entry)` (of type int/Int32 ?) in the OP is larger than Int32.MaxValue (the type defined for GetHashCode)? Thus resulting in the overflow? In my case I still got the error when after changing the return type to Int64... I also had to cast the property in the lambda to Int64 (or long). For example: `return obj.Sum(entry => 23 * (Int64)GetHashCode(entry));` – Jonno Jun 03 '22 at 11:30

3 Answers3

10

LINQ's Enumerable.Sum(...) methods perform the additions inside a checked block. This means that they deliberately throw an exception if the sum overflows.

Your sum is not inside a checked block, so whether or not it throws an exception depends on... whether it is called from inside a checked block, or a property on the assembly I believe.

Rawling
  • 49,248
  • 7
  • 89
  • 127
  • 4
    That's right. As a side note: You could use `Aggregate` instead of `Sum`: `x.Aggregate((a, b) => a + b);` (but probably you don't want). – sloth Jun 14 '12 at 13:26
  • 2
    @sloth I would just say that that `Aggregate` call will throw an exception if the list is empty, but if you provide a seed value of `0` it will work just fine: `x.Aggregate(0, (sum, i) => sum + i)` – gfv May 26 '16 at 09:07
5

It's because of the different behavior of Assemblies compiled in C# and the implementation of Enumerable.Sum.

If you compile an Assembly in C#, by default all additions are performed in unchecked mode, which is why you don't get an overflow in your last example. If you want the runtime to throw on overflows, you need to use checked blocks (of course for your hash, you don't want that, so C#'s default behavior is fine).

By contrast, Enumerable.Sum is meant to calculate a sum, and usually, you don't want sums to overflow. That's why Enumerable.Sum performs its calculations in checked mode, which throws an exception if the sum overflows.

Botz3000
  • 39,020
  • 8
  • 103
  • 127
1

if you're computing a Hash Code, you may not want to use Sum anyways. Using Xor (^) will provide the same results and may even spread your hash codes out more than a sum will. Try this method:

public int GetHashCode(List<EventLogEntry> obj)
{
    int total = 0;
    foreach (var eventLogEntry in obj)
    {
        total ^= GetHashCode(eventLogEntry);
    }

    return total;
}
D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Maybe not be relevant for this specific case, but this will lead to e.g. `{ entryA, entryB }` and `{ entryB, entryA }` having the same hash code. – Rawling Jun 14 '12 at 13:43
  • 1
    @dstanley I strongly disagree with that, read more here, http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode – M Afifi Jun 14 '12 at 13:45
  • ... Yup, that's a good point. A proper, multiply-by-a-prime-every-time sum would avoid this, but you can't do that with a LINQ `Sum`. – Rawling Jun 14 '12 at 14:08