44

We ran into a magic decimal number that broke our hashtable. I boiled it down to the following minimal case:

decimal d0 = 295.50000000000000000000000000m;
decimal d1 = 295.5m;

Console.WriteLine("{0} == {1} : {2}", d0, d1, (d0 == d1));
Console.WriteLine("0x{0:X8} == 0x{1:X8} : {2}", d0.GetHashCode(), d1.GetHashCode()
                  , (d0.GetHashCode() == d1.GetHashCode()));

Giving the following output:

295.50000000000000000000000000 == 295.5 : True
0xBF8D880F == 0x40727800 : False

What is really peculiar: change, add or remove any of the digits in d0 and the problem goes away. Even adding or removing one of the trailing zeros! The sign doesn't seem to matter though.

Our fix is to divide the value to get rid of the trailing zeroes, like so:

decimal d0 = 295.50000000000000000000000000m / 1.000000000000000000000000000000000m;

But my question is, how is C# doing this wrong?

edit: Just noticed this has been fixed in .NET Core 3.0 (possibly earlier, I didn't check) : https://dotnetfiddle.net/4jqYos

Jannes
  • 1,784
  • 1
  • 17
  • 20

6 Answers6

29

To start with, C# isn't doing anything wrong at all. This is a framework bug.

It does indeed look like a bug though - basically whatever normalization is involved in comparing for equality ought to be used in the same way for hash code computation. I've checked and can reproduce it too (using .NET 4) including checking the Equals(decimal) and Equals(object) methods as well as the == operator.

It definitely looks like it's the d0 value which is the problem, as adding trailing 0s to d1 doesn't change the results (until it's the same as d0 of course). I suspect there's some corner case tripped by the exact bit representation there.

I'm surprised it isn't (and as you say, it works most of the time), but you should report the bug on Connect.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 5
    FWIW it happens with .NET 2.0 and 3.5 as well. – Christian.K Dec 16 '11 at 11:30
  • 7
    Already on Connect: http://connect.microsoft.com/VisualStudio/feedback/details/314630/system-decimal-gethashcode-violates-equality-contract (although marked as fixed!) – AakashM Dec 16 '11 at 11:35
  • Damn. If that is stll in - this is one of the bad side cases that only pop up very rarely and then really get nasty ;) – TomTom Dec 16 '11 at 12:04
  • 3
    The problem still occurs (with the same `decimal`s as above) in .NET 4.5. – Jeppe Stig Nielsen Sep 19 '12 at 19:09
  • Re-reported bug in MS Connect: https://connect.microsoft.com/VisualStudio/feedback/details/776203/decimal-gethashcode-delivers-different-hash-codes-depending-on-number-of-decimal-places#tabs – AxelEckenberger Jan 10 '13 at 09:34
  • 1
    @AakashM The new thread (opened by myself) [Mathematical explanation of Decimal different representations as Double](http://stackoverflow.com/questions/33482020/) has an answer that seems to shed some light on what was "fixed". They use a broken conversion to `Double`, then find the hash from that `Double` value. The fix they tried to apply seems to involve discarding the least significant bits of that `Double`. Which is not always good enough. See hvd's answer with comments in that other thread. – Jeppe Stig Nielsen Nov 02 '15 at 20:35
4

Another bug (?) that results in different bytes representation for the same decimal on different compilers: Try to compile following code on VS 2005 and then VS 2010. Or look at my article on Code Project.

class Program
{
    static void Main(string[] args)
    {
        decimal one = 1m;

        PrintBytes(one);
        PrintBytes(one + 0.0m); // compare this on different compilers!
        PrintBytes(1m + 0.0m);

        Console.ReadKey();
    }

    public static void PrintBytes(decimal d)
    {
        MemoryStream memoryStream = new MemoryStream();
        BinaryWriter binaryWriter = new BinaryWriter(memoryStream);

        binaryWriter.Write(d);

        byte[] decimalBytes = memoryStream.ToArray();

        Console.WriteLine(BitConverter.ToString(decimalBytes) + " (" + d + ")");
    }
}

Some people use following normalization code d=d+0.0000m which is not working properly on VS 2010. Your normalization code (d=d/1.000000000000000000000000000000000m) looks good - I use the same one to get the same byte array for the same decimals.

CoperNick
  • 2,413
  • 2
  • 21
  • 26
  • Does you `PrintBytes` method not reveal the same information as does `static` method `decimal.GetBits`? The fact that `1`, `1.0`, `1.00`, etc. have different representations, is by design (intended). This thread is more about the serious problem that values that (despite different bit representations) are considered equal by the `Equals` method (and by `==` in C#, and by `op_Equality` in .NET), have distinct hashes. – Jeppe Stig Nielsen Feb 18 '15 at 15:45
  • 1
    @JeppeStigNielsen I'm not talking that `1` and `1.0` have different representation. I'm talking that `one + 0.0m` is `1.0` on VS2005 and `one+0.0m` on VS2010 is `1`. The same code have different representation on different compilers. And if you calculate hash using decimal bytes representation (as probably Microsoft did) you will have similar problems like in question but even worse: you will not see the bug until compiler change. – CoperNick Feb 18 '15 at 20:10
  • 1
    Good find! This behavior is changed again from VS 2013 to VS 2015 (new Roslyn-based C# compiler in 2015). It is now fixed such that the trailing zero that must exist in `1m + 0.0m` is not optimized away (like it erroneously happened with VS 2013 when the `0.0m` was a compile-time constant). I suppose VS 2015 is back to the correct behavior of VS 2005. The error that existed in (some?) intermediate versions is fixed. – Jeppe Stig Nielsen Nov 02 '15 at 13:50
3

Ran into this bug too ... :-(

Tests (see below) indicate that this depends on the maximum precision available for the value. The wrong hash codes only occur near the maximum precision for the given value. As the tests show the error seems to depend on the digits left of the decimal point. Sometimes the only the hashcode for maxDecimalDigits - 1 is wrong, sometimes the value for maxDecimalDigits is wrong.

var data = new decimal[] {
//    123456789012345678901234567890
    1.0m,
    1.00m,
    1.000m,
    1.0000m,
    1.00000m,
    1.000000m,
    1.0000000m,
    1.00000000m,
    1.000000000m,
    1.0000000000m,
    1.00000000000m,
    1.000000000000m,
    1.0000000000000m,
    1.00000000000000m,
    1.000000000000000m,
    1.0000000000000000m,
    1.00000000000000000m,
    1.000000000000000000m,
    1.0000000000000000000m,
    1.00000000000000000000m,
    1.000000000000000000000m,
    1.0000000000000000000000m,
    1.00000000000000000000000m,
    1.000000000000000000000000m,
    1.0000000000000000000000000m,
    1.00000000000000000000000000m,
    1.000000000000000000000000000m,
    1.0000000000000000000000000000m,
    1.00000000000000000000000000000m,
    1.000000000000000000000000000000m,
    1.0000000000000000000000000000000m,
    1.00000000000000000000000000000000m,
    1.000000000000000000000000000000000m,
    1.0000000000000000000000000000000000m,
};

for (int i = 0; i < 1000; ++i)
{
    var d0 = i * data[0];
    var d0Hash = d0.GetHashCode();
    foreach (var d in data)
    {
        var value = i * d;
        var hash = value.GetHashCode();
        Console.WriteLine("{0};{1};{2};{3};{4};{5}", d0, value, (d0 == value), d0Hash, hash, d0Hash == hash);
    }
}
AxelEckenberger
  • 16,628
  • 3
  • 48
  • 70
1

This is a decimal rounding error.

Too much precision is required to set d0 with the .000000000000000, as a consequence the algorithm in charge of it makes a mistake and ends up giving a different result. It could be classified as a bug in this example, although note that "decimal" type is supposed to have a precision of 28 digits, and here, you are actually requiring a precision of 29 digits for d0.

This can be tested by asking for the full raw hexadecimal representation of d0 and d1.

Cyan
  • 13,248
  • 8
  • 43
  • 78
  • 1
    Your answer doesn't sound correct tbh. Sure it may be too many digits, but numerically d0 and d1 are the same, there's no problem there. And of course in binary they are different: 295.50 and 295.5 would also be binary different. The issue here is that the GetHashCode function returns a wrong hash for one of them. Also: the reason I ran into this is that the decimal was coming from MySQL, of course I never typed that many trailing zeroes myself. – Jannes Dec 17 '11 at 16:57
  • 1
    295.50 and 295.5 are binary identical. It's very easy to check : just output both decimal values in raw format, as if they were raw 128bits hexadecimal values. 295.50000000000000000000000000 on the other hand is slightly different. The 29th digit has an impact on the rounding of lowest bits. Because rounding effects are inevitable when rounding base10 into base2, i guess that the "==" function has some tolerancies for lowest bit(s), considering them as "noisy", which would explain why it says both numbers are equals. But in fact they are not... – Cyan Dec 19 '11 at 14:41
  • .NET stores the trailing zero _somewhere_ in 295.50. ToString() nicely reproduces it. So it can't be binary equal. As far as I know it's stored as 29550 * 10^-2, while 295.5 is stored as 2955 * 10^-1. Not equal. == correctly compares the two, but GetHashCode apparently doesn't do the same conversion. About the base10/base2 rounding of the last digits: yes, I'm also suspecting the cause to be around there somewhere. – Jannes Dec 19 '11 at 18:12
  • you may find an answer to this question in this topic: http://stackoverflow.com/questions/33482020/mathematical-explanation-why-decimals-conversion-to-double-is-broken-and-decima/33483153#33483153 Cyan has its point but its not exactly that – mikus Jun 07 '16 at 13:02
1

I tested this in VB.NET (v3.5) and got the same thing.

The interesting thing about the hash codes :

A) 0x40727800 = 1081243648

B) 0xBF8D880F = -1081243648

Using Decimal.GetBits() I found

format : Mantissa (hhhhhhhh hhhhhhhh hhhhhhhh) Exponent(seee0000) (h is values, 's' is sign, 'e' is exponent, 0 must be zeros)

d1 ==> 00000000 00000000 00000B8B - 00010000 = (2955 / 10 ^ 1) = 295.5

do ==> 5F7B2FE5 D8EACD6E 2E000000 - 001A0000

...which converts to 29550000000000000000000000000 / 10^26 = 295.5000000...etc

** edit : ok, I wrote a 128-bit hex-decimal calculator and the above is exactly correct

It definitely looks like an internal conversion bug of some sort. Microsoft explicitly states that they do not guarantee their default implementation of GetHashCode. If you are using it for anything important then it probably makes sense to write your own GetHashCode for the decimal type. Formatting it to a fixed decimal, fixed width string and hashing seems to work, for example (>29 decimal places, > 58 width - fits all possible decimals).

* edit : I don't know about this anymore. It still must be a conversion error somewhere since the stored precision fundamentally changes the real value in memory. That the hash codes end up as signed negatives of each other is a big clue - would need to look further into the default hash code implementation to find more.

28 or 29 digits shouldn't matter unless there is dependent code which does not evaluate the outer extents properly. The largest 96-bit integer accessible is :

79228162514264337593543950335

so you can have 29 digits so long as the whole thing (without decimal point) is less than this value. I can't help but think that this is something much more subtle in the hash code calculation somewhere.

J...
  • 30,968
  • 6
  • 66
  • 143
  • In my case I'm not really interested in GetHashCode itself. I ran into the problem while using a standard .NET Dictionary or HashTable. I don't know if that uses GetHashCode internally (I assumed) but it has the same problem either way. – Jannes Dec 17 '11 at 17:09
-7

The documetation suggests that because of GetHashCode() being unpredictable, you should create your own. It's considered unpredictable because each Type has it's own implementation and since we don't know the internals of it we should create our own according to how we evaluate uniqueness.

However, I think the answer is that GetHashCode() is not using the mathematical decimal value to create the hash code.

Mathematically we see 295.50000000 and 295.5 as being the same. When you look at the decimal objects in the IDE this is true too. However, if you do a ToString() on both decimals you will see that the compiler sees them differently, i.e. you will still see 295.50000000. GetHashCode() is evidently not using the mathematical representation of the decimal for creating the hash code.

Your fix is simply creating a new decimal without all the trailing zeros which is why it works.

Benster74
  • 67
  • 3
  • 1
    Read. He is not talking about his own classes. There is a clearly defined contract that the lass violates and that basically is confirmed by MS. You not thinking this is a bug and then going to totally not related documentation doesn ot help either. – TomTom Dec 16 '11 at 12:03
  • 4
    GetHashCode must return the same value for equal objects - that's the whole point of it – oleksii Dec 16 '11 at 13:27
  • 1
    `Decimal` overrides [`GetHashCode`](http://msdn.microsoft.com/en-us/library/system.decimal.gethashcode.aspx), so the comments for `ValueType.GetHashCode` don't apply. – Richard Dec 18 '11 at 15:05