10

Are there any C# specifications that state how implicit conversions of terms of integral types (e.g., int) to terms of double are supposed to work? If so, can someone tell me the algorithm or direct me to it?

C# 6.0 draft specification states “the value of a real literal of type float or double is determined by using the IEEE ‘round to nearest’ mode” under Lexical structure -> Grammars -> Lexical grammar -> Lexical analysis -> Tokens -> Literals -> Real literals; however I wasn’t able to find anything about how implicit conversions work.

The only thing I found under Conversions -> Implicit conversions -> Implicit numeric conversions in the same specification was “conversions from int, uint, long, or ulong to float and from long or ulong to double may cause a loss of precision, but will never cause a loss of magnitude.”

I do know that implicit conversions don’t follow the same algorithm that real literals do as the below program illustrates*:

using System;
using System.Diagnostics;
namespace App
{
    internal static class Program
    {
        internal static void Main()
        {
            Debug.Assert(GetFirstByte(10648738977740919977) != GetFirstByte(10648738977740919977d));
        }
        private static byte GetFirstByte(double val)
        {
            return BitConverter.GetBytes(val)[0];
        }
    }
}

Edit

The above code may be more “complicated” than it needs to be. Here is another program that should hopefully clarify what I am asking.

using System;
using System.Diagnostics;
namespace App
{
    internal static class Program
    {
        internal static void Main()
        {
            Debug.Assert(10648738977740919977 != 10648738977740919977d);
        }
    }
}

Addendum

As The General and mjwills stated in the comments, this is almost certainly due to the extended precision format that some ISAs like x86 offer. As to why the .NET Core compiler relies on the extended format to convert the ulong to a double but doesn’t do the same for the real literal is beyond me. Not sure if this is technically a “bug”, but it would be nice if both did the same thing. One can be compliant with the above specification and still use the extended format since IEEE 754-2019 explicitly allows for more than 64-bits of precision. Anyway, the ulong value can fit entirely in the 64-bit significand of x86’s extended format thus leading to no rounding.

TL;DR (aka Edit 2)

I’ll preface this edit with the fact that I am fundamentally and philosophically against the notion that what I am about to write is necessary or even desirable. I believe technical questions that are specific to a particular programming language like this one still “fit” in Stack Overflow and not any of the other Stack Exchange sites (e.g., Computer Science, Theoretical Computer Science, Mathematics and Math Overflow—for things like homotopy type theory). This means that wanting to know the nitty-gritty details of something—even if one may (incorrectly) perceive such things as leading to a violation of “best practices”—is still a worthwhile question. If there exists a more fundamental problem, then a separate question can be made concerning it.

Background

I am creating a 128-bit unsigned integer type, U128, at my job where we write in VB.NET. I decided to implement the ability to explicitly cast U128 terms to Double (i.e., double in C# parlance) terms. IEEE 754 binary64 and binary32 are rather trivial formats as they are almost identical to how base-10 real numbers are formatted—of course they must be made into finite sequences of bits and have biased exponents. Anyway, I first implemented it in Rust since Rust has a native 128-bit unsigned integer type, u128; and The Rustonomicon explicitly states how casts from u128 terms to f64 terms behave. This allowed me to test my algorithm with Rust’s; and unsurprisingly due to the trivial nature of the algorithm—it is ≈ 12 lines of code—my implementation matched Rust’s for several edge cases and 1 billion randomly generated numbers—no, I did not take the time to formally verify that my algorithm was correct.

I then ported my algorithm to VB.NET—knowing how much more popular C# is here, I rewrote it in C# as well and confirmed it had the same behavior—but I wanted to be confident that nothing got lost in translation. The best I could do was to compare casts of ULong (ulong in C#) terms to Double terms with casts of the equivalent ULongs as U128s to Doubles. Sure enough I was dismayed when I discovered 10648738977740919977UL was being cast differently than the equivalent U128. I (correctly) assumed there was a problem with the rounding—FYI, the C# specification does not say how to round numbers that lie perfectly between two numbers; but as expected, it rounds to even. When I compared the first byte—I am using a little-endian CPU—of the Double that my cast created with that of Rust’s, I found that mine was correct. At this point I assumed there was something “fishy” with VB.NET (and later confirmed in C#) since I typically trust Rust more and as previously stated the algorithm is rather trivial.

Fortunately, I was not aware of the (unfortunate) quirk that C# allows for programs to use extended precision capabilities on CPUs that have them including non-compliant ones like x86-based CPUs that only have 80-bits of precision. Had I known that, I likely would have dropped it.

It was not until I examined the first byte of the Double term, 10648738977740919977R (10648738977740919977d in C#), that I was truly befuddled as I found that it did agree with my algorithm. How could this be? I used the same exact machine compiled with the same compiler for the same platform. Finally, I correctly surmised that there is likely a difference in behavior in how the lexical parser treats real literals with how it treats integral literals that are cast to Doubles. To test this theory, I hacked up the program in the initial post (in VB.NET at the time).

At this point, I assumed that implicit casts were using a different algorithm (perhaps for efficiency reasons since one has to track 3 additional bits to know how to properly round). That is why my question was formulated the way it was. I wanted to know the algorithm so that my algorithm would align with it (even though my initial algorithm is (very likely) technically correct per IEEE 754).

Luckily with the eventual help of mjwills, The General, and NetMage, it was discovered that it likely lied with the non-compliant extended precision capabilities of my CPU; although the fact this happens at compilation time is fundamentally different than previous posts that highlighted runtime discrepancies.

I encourage everyone to take the time to read the amazing answer and comments by tannergooding in the link of the answer I eventually posted (including forking over the $15 to read the formal proof about when extended precision abilities are OK and the requirements of such).

* Compiled with Microsoft Visual C# Compiler version 3.7.0-6.20459 for .NET Core 3.1 on Windows 10 Pro 18363.1139 on an Intel Core i7-6600U CPU.

philomathic_life
  • 514
  • 3
  • 18
  • [IEEE 754 round to nearest](https://en.wikipedia.org/wiki/IEEE_754#Roundings_to_nearest) – John Wu Oct 26 '20 at 23:42
  • I'm not sure what you're trying to discover here. A `ulong` and a `double` have different binary representations even though they ostensibly hold the same value. This is nothing to do with C# "casting" terms. These are simply values represented by different binary values. Does that clear it up? – Enigmativity Oct 26 '20 at 23:49
  • 1
    This is simply an extended precision problem, one is using 80bits the other 64bit, in turn come to different numbers. – TheGeneral Oct 27 '20 at 00:39
  • @TheGeneral, I don't understand how that matters. Having an additional 16 bits of information doesn't change the fact that IEEE 754 round to nearest would cause the number to round up. – philomathic_life Oct 27 '20 at 00:40
  • @TheGeneral How so, where is the 80bits from? – NetMage Oct 27 '20 at 00:40
  • 2
    So the implicit conversions do the same as `Convert.ToDouble` which uses the .Net instruction [`conv.r.un`](https://learn.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.conv_r_un?view=netcore-3.1) which produces a float32 or float64 depending on precision required. It seems that isn't the same as Roslyn's internal conversion? I think you've found a compiler bug... – NetMage Oct 27 '20 at 00:43
  • @NetMage, I hope it is a bug because that is terribly confusing. I know in Rust `assert!((10648738977740919977u64 as f64) == 10648738977740919977f64)` does not panic. Furthermore, the result is consistent with real literals in C#. – philomathic_life Oct 27 '20 at 00:45
  • 1
    https://sharplab.io/#v2:EYLgdgpgLgZgHgGiiAlgGwD4AEBMBGAWAChcAGY3AZgAIBvY6xpx3HB5p+ogSF6zwCcACjykAbABYAHAHZKUgTKUTSAwYpnUAvFuqjJs+Rpkq1AjQBMAlAG5ivbv2FCLAewCuwNBCv7pchSUTVXUlbV0/Q0DlEPMlazsiDmTqJyEAYVcAWwAHAEMAJwgRcX8jINNQmQQ9UqjjSriZa1t2ZIBfe24unIKUADc8qAhUvDFqYFdXNGpM3MLit09vakG0GqWvEbWcKzaOLhSOLE018NW8tBxEo86iYnagA== - fiddle with the compiler dropdown (since different compilers act differently). Fascinating that the Framework IL gets the same value, while the other compiler gets two different values - *both different* to the framework one. Agreed that 80 bits is likely underlying cause. – mjwills Oct 27 '20 at 00:56
  • `How so, where is the 80bits from?` https://stackoverflow.com/a/2343351/34092 – mjwills Oct 27 '20 at 01:04
  • 1
    Along the lines of https://stackoverflow.com/a/2343351/34092, my guess is that the compiler is converting the `long` to `double` via 80 bits ("extended") in .NET Core's compiler. It is just a guess. Although this is interesting, it is fundamentally an example of the generic "comparing floating point numbers is hard" problem. Although it would be nice if the compilers gave consistent IL. ;) – mjwills Oct 27 '20 at 01:06
  • @mjwills, so the real literal value first gets rounded per IEEE 754 before being extended to 80 bits, but the `ulong` gets extended first? Binary32 and binary64 are not difficult at all though. It is pretty clear how to format numbers to them. It is bizarre that implicit casts don't have the same behavior or even weirder what @NetMage found that `10648738977740919977d != 10648738977740919977.0`. The spec formally states what should happen for real literals. – philomathic_life Oct 27 '20 at 01:09
  • 1
    `what @NetMage found that 10648738977740919977d != 10648738977740919977.0` I am not able to replicate that.https://sharplab.io/#v2:EYLgHgbALANALiATgVwHYB8ACAmAjAWACgBIU0ozAZgAIdbcB2IgbyOJ2zdZPdwE4AFLgAM0ABwNKYvg1lRhffjIYATagF511EeMnTZDeYr7KAdMICUAbja9BOqBKnLDCpbLWbtox3pdH3BmtbTH4hHyd9OTcTDw0tARUAe2RgABsAUwsHSP8Y5WCSAF8iIqA=== https://dotnetfiddle.net/ – mjwills Oct 27 '20 at 01:17
  • @mjwills, OK. I think I understand what you and @The General were saying about it being a result of the extended precision format that x86 provides. I think when the compiler compiles the real literal value, it encodes it to binary64—by implementing the rounding behavior directly (like I am doing with my 128-bit integer type)—however the implicit cast of the `ulong` happens at runtime which relies on the CPU's extended precision capabilities; and since all 64 bits can fit in the significand, it doesn't incur any rounding. – philomathic_life Oct 27 '20 at 01:46
  • 1
    Neither are at runtime (well at https://sharplab.io/#v2:EYLgHgbALANALiAlgGwD4AEBMBGAsAKAEhjiD0BmAAi2uwHYCBvAwrTF5o17ATgApsABmgAOOuRE860qIJ68pdACaUAvKspDR4ydLqz5PRQDpBASgDcLbvy1QxExfrkLpK9ZuH2dTg67qW1ui8Al4OujIuRm5qGnxKAPYArsDIAKZmduG+UYqBRAC+BAVAA= anyway). – mjwills Oct 27 '20 at 01:48

2 Answers2

3

The algorithm is IEEE 754 Round to nearest, ties to even. The “counterexamples” to this shown in the question are in fact proofs of a bug which has been accepted by the Roslyn team. The bug is caused by the “runtime conversion implementation”.

philomathic_life
  • 514
  • 3
  • 18
  • It is odd to think of it as a runtime bug, when the issue is in the IL (prior to runtime). _The comments describe why it is though._ – mjwills Oct 27 '20 at 21:45
-4

Questioning the conversion from ulong to double being different from the assignment of a ulong to a double variable.

public static void Test()
{
    ulong lValue = 10648738977740919977;
    double dValue = 10648738977740919977;
    byte[] bytesFromLong = BitConverter.GetBytes(lValue);
    byte[] bytesFromDouble = BitConverter.GetBytes(dValue);
    byte[] bytesFromImplicitDouble = BitConverter.GetBytes((double)10648738977740919977);

    string HexAlphabet = "0123456789ABCDEF";
    string res = "Original Number : 10648738977740919977\nLong Bytes : 0x";
    foreach (byte b in bytesFromLong)
    {
        res += HexAlphabet[(int)(b >> 4)] + HexAlphabet[(int)(b & 0xF)];
    }
    res += "\nDouble Bytes : 0x";
    foreach (byte b in bytesFromDouble)
    {
        res += HexAlphabet[(int)(b >> 4)] + HexAlphabet[(int)(b & 0xF)];
    }
    res += "\nImplicit Double Bytes : 0x";
    foreach (byte b in bytesFromImplicitDouble)
    {
        res += HexAlphabet[(int)(b >> 4)] + HexAlphabet[(int)(b & 0xF)];
    }

    dValue = BitConverter.ToDouble(bytesFromDouble);
    res += "\nDouble read from Double bytes : " + String.Format("{0:0.#}", dValue);
    dValue = BitConverter.ToDouble(bytesFromImplicitDouble);
    res += "\nDouble read from Implicit Double bytes : " + String.Format("{0:0.#}",dValue);

    Console.WriteLine(res);
}

Result

Original Number : 10648738977740919977

Long Bytes : 0x122109114117134135122108

Double Bytes : 0x103103124108138111119103

Implicit Double Bytes : 0x103103124108138111119103

Double read from Double bytes : 10648738977740900000

Double read from Implicit Double bytes : 10648738977740900000

Looks the same..

Xavier
  • 1,383
  • 7
  • 6
  • 2
    `Your getting the bytes from a long and comparing them with the bytes from a double` Is he/she? `GetBytes` is _always_ being called with a `double`. – mjwills Oct 27 '20 at 00:02
  • No, that is not correct. The `GetFirstByte` function clearly takes terms of type `double`. When I pass in the `ulong` term, `10648738977740919977`, the compiler implicitly casts it to a `double` term. The value of which is *different* than the real literal value `10648738977740919977d`. – philomathic_life Oct 27 '20 at 00:03
  • Your code samples are completely unrelated to the question. You have changed the problem by calling `BitConverter.GetBytes` on _different things_ to the OP. – mjwills Oct 27 '20 at 00:08
  • Here's a [modified version](https://rextester.com/VNG97264) of your code. It shows bytes from long, bytes from double, and bytes from implicitly cast double. The last one is what the OP is using, not the first one. – 41686d6564 stands w. Palestine Oct 27 '20 at 00:10
  • Maybe not.. have a look at the code at https://rextester.com/LOJ37914 – Xavier Oct 27 '20 at 00:36
  • `Maybe not.. ` It is unclear the point you are making. – mjwills Oct 27 '20 at 04:34
  • Your program is still not equivalent to mine. In particular your assignment of the value `10648738977740919977` to the `double` variable is still undergoing an implicit conversion from a `ulong` to a `double`; therefore your program should output that they are all the same (modulo compiler bugs). You must replace that line with `double dValue = 10648738977740919977d` that way you are using a *real literal* as mentioned in my post. Furthermore as @mjwills stated in the comments, .NET Core behaves differently than .NET Framework, so you must be targeting .NET Core 3.1 like I do. – philomathic_life Oct 27 '20 at 13:27
  • To be extra careful, the compiler should be the same as the one mentioned in the footnote of my post. Last, if as expected this is due to extended precision, you must additionally compile the code on a CPU whose ISA has an extended precision format *with at least a 64-bit significand* (e.g., some (all?) x86-based CPUs). – philomathic_life Oct 27 '20 at 13:32