Mathematical explanation why Decimal's conversion to Double is broken and Decimal.GetHashCode separates equal instances

Question

I am not sure if this non-standard way of stating a Stack Overflow question is good or bad, but here goes:

What is the best (mathematical or otherwise technical) explanation why the code:

static void Main()
{
  decimal[] arr =
  {
    42m,
    42.0m,
    42.00m,
    42.000m,
    42.0000m,
    42.00000m,
    42.000000m,
    42.0000000m,
    42.00000000m,
    42.000000000m,
    42.0000000000m,
    42.00000000000m,
    42.000000000000m,
    42.0000000000000m,
    42.00000000000000m,
    42.000000000000000m,
    42.0000000000000000m,
    42.00000000000000000m,
    42.000000000000000000m,
    42.0000000000000000000m,
    42.00000000000000000000m,
    42.000000000000000000000m,
    42.0000000000000000000000m,
    42.00000000000000000000000m,
    42.000000000000000000000000m,
    42.0000000000000000000000000m,
    42.00000000000000000000000000m,
    42.000000000000000000000000000m,
  };

  foreach (var m in arr)
  {
    Console.WriteLine(string.Format(CultureInfo.InvariantCulture,
      "{0,-32}{1,-20:R}{2:X8}", m, (double)m, m.GetHashCode()
      ));
  }

  Console.WriteLine("Funny consequences:");
  var h1 = new HashSet<decimal>(arr);
  Console.WriteLine(h1.Count);
  var h2 = new HashSet<double>(arr.Select(m => (double)m));
  Console.WriteLine(h2.Count);
}

gives the following "funny" (apparently incorrect) output:

42                              42                  40450000
42.0                            42                  40450000
42.00                           42                  40450000
42.000                          42                  40450000
42.0000                         42                  40450000
42.00000                        42                  40450000
42.000000                       42                  40450000
42.0000000                      42                  40450000
42.00000000                     42                  40450000
42.000000000                    42                  40450000
42.0000000000                   42                  40450000
42.00000000000                  42                  40450000
42.000000000000                 42                  40450000
42.0000000000000                42                  40450000
42.00000000000000               42                  40450000
42.000000000000000              42                  40450000
42.0000000000000000             42                  40450000
42.00000000000000000            42                  40450000
42.000000000000000000           42                  40450000
42.0000000000000000000          42                  40450000
42.00000000000000000000         42                  40450000
42.000000000000000000000        41.999999999999993  BFBB000F
42.0000000000000000000000       42                  40450000
42.00000000000000000000000      42.000000000000007  40450000
42.000000000000000000000000     42                  40450000
42.0000000000000000000000000    42                  40450000
42.00000000000000000000000000   42                  40450000
42.000000000000000000000000000  42                  40450000
Funny consequences:
2
3

Tried this under .NET 4.5.2.

The technical explanation is that it's caused by a framework bug: http://stackoverflow.com/questions/8533449/c-sharp-why-can-equal-decimals-produce-unequal-hash-values — LukeH, Nov 02 '15 at 16:51
A bug found in 2011 and not fixed today? Or is it fixed in 4.6? — Meta-Knight, Nov 02 '15 at 17:24
@mikus How? (I do not have access to .NET 4.6 right here.) I guess this is a runtime issue, not something the C# compiler can repair in any way. — Jeppe Stig Nielsen, Nov 02 '15 at 20:30
see bottom of hvd answer below, it's not a runtime issue, its a way of hash code calculation, which is a framework code — mikus, Nov 02 '15 at 20:41

score 14 · Answer 1 · 2015-11-02T20:23:52.223

In Decimal.cs, we can see that GetHashCode() is implemented as native code. Furthermore, we can see that the cast to double is implemented as a call to ToDouble(), which in turn is implemented as native code. So from there, we can't see a logical explanation for the behaviour.

In the old Shared Source CLI, we can find old implementations of these methods that hopefully sheds some light, if they haven't changed too much. We can find in comdecimal.cpp:

FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
    WRAPPER_CONTRACT;
    STATIC_CONTRACT_SO_TOLERANT;

    ENSURE_OLEAUT32_LOADED();

    _ASSERTE(d != NULL);
    double dbl;
    VarR8FromDec(d, &dbl);
    if (dbl == 0.0) {
        // Ensure 0 and -0 have the same hash code
        return 0;
    }
    return ((int *)&dbl)[0] ^ ((int *)&dbl)[1];
}
FCIMPLEND

and

FCIMPL1(double, COMDecimal::ToDouble, DECIMAL d)
{
    WRAPPER_CONTRACT;
    STATIC_CONTRACT_SO_TOLERANT;

    ENSURE_OLEAUT32_LOADED();

    double result;
    VarR8FromDec(&d, &result);
    return result;
}
FCIMPLEND

We can see that the the GetHashCode() implementation is based on the conversion to double: the hash code is based on the bytes that result after a conversion to double. It is based on the assumption that equal decimal values convert to equal double values.

So let's test the VarR8FromDec system call outside of .NET:

In Delphi (I'm actually using FreePascal), here's a short program to call the system functions directly to test their behaviour:

{$MODE Delphi}
program Test;
uses
  Windows,
  SysUtils,
  Variants;
type
  Decimal = TVarData;
function VarDecFromStr(const strIn: WideString; lcid: LCID; dwFlags: ULONG): Decimal; safecall; external 'oleaut32.dll';
function VarDecAdd(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecSub(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarDecDiv(const decLeft, decRight: Decimal): Decimal; safecall; external 'oleaut32.dll';
function VarBstrFromDec(const decIn: Decimal; lcid: LCID; dwFlags: ULONG): WideString; safecall; external 'oleaut32.dll';
function VarR8FromDec(const decIn: Decimal): Double; safecall; external 'oleaut32.dll';
var
  Zero, One, Ten, FortyTwo, Fraction: Decimal;
  I: Integer;
begin
  try
    Zero := VarDecFromStr('0', 0, 0);
    One := VarDecFromStr('1', 0, 0);
    Ten := VarDecFromStr('10', 0, 0);
    FortyTwo := VarDecFromStr('42', 0, 0);
    Fraction := One;
    for I := 1 to 40 do
    begin
      FortyTwo := VarDecSub(VarDecAdd(FortyTwo, Fraction), Fraction);
      Fraction := VarDecDiv(Fraction, Ten);
      Write(I: 2, ': ');
      if VarR8FromDec(FortyTwo) = 42 then WriteLn('ok') else WriteLn('not ok');
    end;
  except on E: Exception do
    WriteLn(E.Message);
  end;
end.

Note that since Delphi and FreePascal have no language support for any floating-point decimal type, I'm calling system functions to perform the calculations. I'm setting FortyTwo first to 42. I then add 1 and subtract 1. I then add 0.1 and subtract 0.1. Et cetera. This causes the precision of the decimal to be extended the same way in .NET.

And here's (part of) the output:

...
20: ok
21: ok
22: not ok
23: ok
24: not ok
25: ok
26: ok
...

Thus showing that this is indeed a long-standing problem in Windows that merely happens to be exposed by .NET. It's system functions that are giving different results for equal decimal values, and either they should be fixed, or .NET should be changed to not use defective functions.

Now, in the new .NET Core, we can see in its decimal.cpp code to work around the problem:

FCIMPL1(INT32, COMDecimal::GetHashCode, DECIMAL *d)
{
    FCALL_CONTRACT;

    ENSURE_OLEAUT32_LOADED();

    _ASSERTE(d != NULL);
    double dbl;
    VarR8FromDec(d, &dbl);
    if (dbl == 0.0) {
        // Ensure 0 and -0 have the same hash code
        return 0;
    }
    // conversion to double is lossy and produces rounding errors so we mask off the lowest 4 bits
    // 
    // For example these two numerically equal decimals with different internal representations produce
    // slightly different results when converted to double:
    //
    // decimal a = new decimal(new int[] { 0x76969696, 0x2fdd49fa, 0x409783ff, 0x00160000 });
    //                     => (decimal)1999021.176470588235294117647000000000 => (double)1999021.176470588
    // decimal b = new decimal(new int[] { 0x3f0f0f0f, 0x1e62edcc, 0x06758d33, 0x00150000 }); 
    //                     => (decimal)1999021.176470588235294117647000000000 => (double)1999021.1764705882
    //
    return ((((int *)&dbl)[0]) & 0xFFFFFFF0) ^ ((int *)&dbl)[1];
}
FCIMPLEND

This appears to be implemented in the current .NET Framework too, based on the fact that one of the wrong double values does give the same hash code, but it's not enough to completely fix the problem.

at least they tried to improve it finally :P I really wonder why dont they build a hashcode from scratch, like it's done for double I believe. From the example mentioned in the code of roslyn it looks like the same decimal can have different internal representation, which seems a bit weird to me, but might be the reason for usind double. For float/double the format is normalized. Exactly what I would expect for decimal.. why not? In fact it seems that in some cases actual dec.ToString().GetHashCode() could be more precise than getting it from double... :) — mikus, Nov 02 '15 at 19:32
@mikus Yes. `1.0m` and `1.00m` intentionally have different internal representations, since otherwise `ToString()` couldn't return that same number of decimals. But because `1.0m.Equals(1.00m)` is true, their hash codes must be equal too. Normalising the value achieves that. — , Nov 02 '15 at 20:03
So we now know that `Decimal.GetHashCode()` works by converting to `Double` and then using the hash of the `Double`. So this shows that there is a connection between cases where the conversion to double is dubious, and cases where `Decimal.GetHashCode()` is broken. In my examples, one would expect all conversions to hit `42.0` exactly since this integer is exactly representable as a `double`. We see two cases where the conversion is anomalous. The hash of the one with `42.000000000000007` is fixed by the hack of not regarding the last digits. With `41.999999999999993` the error comes early. — Jeppe Stig Nielsen, Nov 02 '15 at 20:20
@JeppeStigNielsen Oh wow, I completely missed that the `42.000000000000007` one does have the same hash code... Will have to update. — , Nov 02 '15 at 20:22
*Edit:* Yeah, that is why one funny consequence is `3` and the other is `2`. It means their fix works in some cases, but not in cases where the error "wraps around" and gives an error in a relatively significant digit of the `Double` representation. They should have fixed the conversion to `Double`, however, instead of hacking the hash algorithm to discard the least significant part. The question remains why the conversion to `Double` can fail. I guess mikus's answer points to some "random" patterns that may be relevant to the erroneous conversion. — Jeppe Stig Nielsen, Nov 02 '15 at 20:24
@JeppeStigNielsen Right. If the conversion to `double` is implemented as first converting the unscaled value and then adjusting for scale, then that's two separate operations that can both introduce rounding errors. — , Nov 02 '15 at 20:26
@JeppeStigNielsen, its not random, its deterministic and defined :) the error occurs while convertin the mantisse of the decimal, which ex for 42.0000 is 420000, not 42 as clearly stated in the Decimal page in msdn. If there is a better conversion algorithm, no idea. If it's worth to not-normalize it to keep the precision trace? Not me to judge. In the meanwhile I cannot find an implementation of 'VarDecCmp' procedure, that is used for decimal comparison. I am pretty sure it does not convert to double, so I am wondering how they deal with that. Anyone? — mikus, Nov 02 '15 at 23:13
@mikus: CoreCLR implementation of `VarDecCmp` is at line 1170 of https://github.com/dotnet/coreclr/blob/de8b85b643ac08d69696ad078846424b69ae651f/src/palrt/decarith.cpp ...and you're correct: there's no converting to double. — LukeH, Nov 03 '15 at 02:03
Ok, so there are first some optimisations and then DecAddSub happens, which is nothing else then substracting one decimal from each other to see if it produces 0. Within DecAddSub we can see as one decimal is scaled. (Exponent are compared and if different then smaller is multiuplied by power of 10). Something that could totally happen to create a hashcode. However; its far more work than current solution and I guess that might ahve been the point. — mikus, Nov 03 '15 at 08:26

mikus · Answer 2 · 2016-06-07T12:59:48.207

8

As for the difference in hashes it indeed seems to be wrong (same value, different hash) -> but it is answered already by LukeH in his comment.

As for the casting to double, though.. I see it that way:

42000000000000000000000 has different (and less 'precise') binary representation than 420000000000000000000000 and therefore you pay higher price for trying to round it.

Why it matters? Apparently decimal keeps track of its 'precision'. So for example it is storing 1m as 1*10^0 but its equivalent 1.000m as 1000*10^-3. Most likely to be able to print it later as "1.000". Therefore when converting your decimal to double it's not 42 that you need to represent, but for example 420000000000000000 and this is far from optimal (mantissa and exponent are converted separately).

According to a simulator I have found (js one for Java, so not exactly what we may have for C# and therefore a bit different results, but meaningful):

42000000000000000000 ~ 1.1384122371673584 * 2^65 ~ 4.1999998e+19
420000000000000000000 = 1.4230153560638428 * 2^68 = 4.2e+20 (nice one)
4200000000000000000000 ~ 1.7787691354751587 * 2^71 ~ 4.1999999e+21
42000000000000000000000 ~ 1.111730694770813 * 2^75 ~ 4.1999998e+22

As you can see the value for 4.2E19 is less precise than for 4.2E20 and may end up being rounded to 4.19. If this is how the conversion to double happens then the result is not surprising. And since multiplying by 10, you'll usually encounter a number that is non-well-represented in binary, then we should expect such issues often.

Now to my mind its all the price for keeping trace of significant digits in decimal. If it was not important, we could always ex. normalize 4200*10^-2 to 4.2*10^1 (as double does it) and conversion to double wouldn't be that error-prone in context of hashcodes. If it's worth it? Not me to judge.

BTW: those 2 links provide nice reading about decimals binary representation: https://msdn.microsoft.com/en-us/library/system.decimal.getbits.aspx

https://msdn.microsoft.com/en-us/library/system.decimal.aspx

edited Jun 07 '16 at 12:59

answered Nov 02 '15 at 17:11

mikus

3,042
1
30
40

This likely is how the conversion to double happens, but it's still wrong. :) Good answer. You're covering how this bug could happen, I'm focusing more on the technical aspect in my answer, which covers where this bug could be. – Nov 02 '15 at 18:14
conversion to double is not wrong, to my minds it's how it is, how it is supposed to be and how it's expected to be, it's just a nature of floating point representation and the hash code calculation is answered in the referenced question – mikus Nov 02 '15 at 19:16
1

Generally, floating-point operations that can't return an exact correct value return the nearest representable value (where "nearest" depends on the rounding mode). If the decimal-to-double conversion had been implemented like that, there wouldn't be any problem, especially considering the conversion *can* give an exact result here. To me, it still seems like a lot like a bug. – Nov 02 '15 at 20:14
From what I can see there is no exact result, and most likely the rounding used by string.format is basically floor one, as usually considered default. Im not totally sure how this conversion is implemented, but it's very likely to be as above. – mikus Nov 02 '15 at 20:31
There is definitely an exact result. 42 can be represented exactly in `double` no matter how many zeroes you try to add after the decimal point: those zeroes do not change the value, and therefore do not change whether the value is representable. To be clear: the user's requested operation is a single conversion from `decimal` to `double`. The conversion may be internally performed by separate operations, but the user didn't request those separate operations, so the implementation should be responsible for solving the rounding problems introduced by them. – Nov 02 '15 at 21:11
well, ok, the whole thing here is around the fact that the internal representation of decimal for 42.0000000m and 42m is different. Now I'd say, just normalize it like a double in IEEE 754 standard, and then hashcode. Even if there is a reason not to normalize it before saving anyway. But clearly designers had something else in mind, I am pretty sure its not for ignorance, i doubt its performance.. so what? – mikus Nov 02 '15 at 22:32
I don't know if the original implementation for OLE automation had the same reasoning behind it, but in .NET, the decimal precision is preserved in a useful way in the string representation: `1.00m.ToString()` returns `"1.00"`, not `"1"`. And when loading values from a fixed-precision column from an external source (such as a databaes), this way of keeping the number of decimals equal across rows even if the last digit happens to be zero makes for a cleaner display. Ditto when `decimal` is used for currencies. – Nov 02 '15 at 22:40
yeap, exactly, and therefore 3, might be stored as 300*10^-2, and nothing seems to stop us from using similar algorithm that allows to compare decimals to each other for hashcode. (I assume equals runs some normalizaton code to compare decimals, as using doubles would be a sin, and also the result would be wrong in our original case). So yeap, I can agree there is some misfortune solution here. – mikus Nov 02 '15 at 22:48

Mathematical explanation why Decimal's conversion to Double is broken and Decimal.GetHashCode separates equal instances

2 Answers2

Linked