3

If I assign the value 0.1 to a float:

float f = 0.1;

The actual value stored in memory is not an exact representation of 0.1, because 0.1 is not a number that can be exactly represented in single-precision floating-point format. The actual value stored - if I did my maths correctly - is

0.100000001490116119384765625

But I can't identify a way to get C# to print out that value. Even if I ask it to print the number to a great many decimal places, it doesn't give the correct answer:

// prints 0.10000000000000000000000000000000000000000000000000
Console.WriteLine(f.ToString("F50"));

How can I print the exact value stored in a float; the value actually represented by the bit-pattern in memory?

EDIT: It has been brought to my attention elsewhere that you can get the behaviour I ask for using standard format strings... on .NET Core and .NET 5.0. So this question is .NET Framework specific, I guess.

Hammerite
  • 21,755
  • 6
  • 70
  • 91
  • What does it look like if you type it as a `double`? – Flydog57 Aug 28 '21 at 16:38
  • 1
    @OlivierRogier: No, those are not duplicates. This question is not about whether floating-point has rounding errors but about how to get C# to display the exact value. Note that, by definition, a floating-point number represents exactly one value. Approximations occur in operations, not in numbers. – Eric Postpischil Aug 28 '21 at 16:39
  • @Flydog57: I don't know and I don't care. My question is about single-precision floating-point, not double, although an analogous question exists in principle for doubles and a satisfactory answer to my question would imply one for the double analogue. – Hammerite Aug 28 '21 at 16:41
  • @Hammerite: Some of how the Microsoft conversion software decides to format `float` is based on the characteristics of a `float` and how accurate Microsoft thinks it is. Storing the value in a `double` and printing that may result in different output. And that is not necessarily proposed as a solution but as a diagnostic technique. You should not reject suggestions out of hand or appear impudent about them. People are contributing their valuable time to help for free, and, if you do not respect that, they do not have to do anything for you. – Eric Postpischil Aug 28 '21 at 16:44
  • Very well, then, the "F50" for the float as a double is 0.10000000149011600000000000000000000000000000000000 – Hammerite Aug 28 '21 at 16:47
  • 1
    The problem in this question explained in [this answer](https://stackoverflow.com/questions/53663632/converting-float-to-double-loses-precision-c-sharp/53668599#53668599). I would mark it as a duplicate except this post also asks “How can I print the exact value stored in a float”, to which I expect the answer might be that Microsoft’s C# library does not provide a facility for this, so you must do it yourself or seek third-party solutions or workarounds like using Java instead. (Java and JavaScript specifications are more rigorous about display of floating-point values.) – Eric Postpischil Aug 28 '21 at 16:59
  • Thanks for indulging me. When you did your math to get the _exact_ value you came up with, I'm assuming you did it using binary math. Remembering that a float has a 23-bit mantissa (IIRC), what happens to your math if you cut off the calculation at 23 bits (OK, maybe I'm socratic, bit I'm also curious and you have the numbers in front of you). You also need to realize that most floating point questions on this site come from people who couldn't count to 31 on the fingers of their left hand – Flydog57 Aug 28 '21 at 17:04
  • I determined that the bit pattern of the float in memory is 0 01111011 10011001100110011001101. I then consulted Wikipedia and determined that the number is represented as (2 ^ -4) * (1 + 5033165/8388608); the 5033165 is the mantissa – Hammerite Aug 28 '21 at 17:09
  • The IEEE-754 format has a 24-bit significand. 23 bits are encoded in the primary significand field of the format, and one bit is encoded by way of the exponent field. The preferred term is “significand.” “Mantissa” is an old word for the fraction portion of a logarithm. The significand for the `float` nearest .1 is 13,421,773; it is represented as 13,421,773•2^−27 or, equivalently, 1.10011001100110011001101•2^−3. – Eric Postpischil Aug 28 '21 at 17:22
  • The significand is the whole fraction portion of the representation; we do not separate the leading bit. The primary significand field contains the trailing portion with the leading bit removed, but it does not encode the entire significand. The stored fields should not be confused with the representation of the number. – Eric Postpischil Aug 28 '21 at 17:24
  • My assessment was based on the treatment given on the Wikipedia page at https://en.wikipedia.org/wiki/Single-precision_floating-point_format, under the heading "IEEE 754 single-precision binary floating-point format: binary32". My wording is probably technically incorrect. But I we are talking about the same number in different ways. Your 13,421,773 is the sum of my 5033165 and 8388608. – Hammerite Aug 28 '21 at 17:29

4 Answers4

1

The basic idea here is to convert the float value into a rational value, and then convert the rational into a decimal.

The following code (for .Net 6, which provides the BitConverter.SingleToUInt32Bits method) will print the exact value of a float (including whether a NaN value is quiet/signalling, the payload of the NaN and whether the sign bit is set). Note that the WriteRational method is not generally-applicable to all rationals as it makes no attempt to detect non-terminating decimal representations: this is not an issue here since all values in a float have power-of-two denominators.

using System; // not necessary with implicit usings
using System.Globalization;
using System.Numerics;
using System.Text;

static string ExactStringSingle(float value)
{
    const int valueBits = sizeof(float) * 8;
    const int fractionBits = 23; // excludes implicit leading 1 in normal values

    const int exponentBits = valueBits - fractionBits - 1;
    const uint signMask = 1U << (valueBits - 1);
    const uint fractionMask = (1U << fractionBits) - 1;

    var bits = BitConverter.SingleToUInt32Bits(value);
    var result = new StringBuilder();

    if ((bits & signMask) != 0) { result.Append('-'); }

    var biasedExponent = (int)((bits & ~signMask) >> fractionBits);
    var fraction = bits & fractionMask;

    // Maximum possible value of the biased exponent: infinities and NaNs
    const int maxExponent = (1 << exponentBits) - 1;

    if (biasedExponent == maxExponent)
    {
        if (fraction == 0)
        {
            result.Append("inf");
        }
        else
        {
            // NaN type is stored in the most significant bit of the fraction
            const uint nanTypeMask = 1U << (fractionBits - 1);
            // NaN payload
            const uint nanPayloadMask = nanTypeMask - 1;
            // NaN type, valid for x86, x86-64, 68000, ARM, SPARC
            var isQuiet = (fraction & nanTypeMask) != 0;
            var nanPayload = fraction & nanPayloadMask;
            result.Append(isQuiet
                ? FormattableString.Invariant($"qNaN(0x{nanPayload:x})")
                : FormattableString.Invariant($"sNaN(0x{nanPayload:x})"));
        }

        return result.ToString();
    }

    // Minimum value of biased exponent above which no fractional part will exist
    const int noFractionThreshold = (1 << (exponentBits - 1)) + fractionBits - 1;

    if (biasedExponent == 0)
    {
        // zeroes and subnormal numbers
        // shift for the denominator of the rational part of a subnormal number
        const int denormalDenominatorShift = noFractionThreshold - 1;
        WriteRational(fraction, BigInteger.One << denormalDenominatorShift, result);
        return result.ToString();
    }

    // implicit leading one in the fraction part
    const uint implicitLeadingOne = 1U << fractionBits;
    var numerator = (BigInteger)(fraction | implicitLeadingOne);
    if (biasedExponent >= noFractionThreshold)
    {
        numerator <<= biasedExponent - noFractionThreshold;
        result.Append(numerator.ToString(CultureInfo.InvariantCulture));
    }
    else
    {
        var denominator = BigInteger.One << (noFractionThreshold - (int)biasedExponent);
        WriteRational(numerator, denominator, result);
    }

    return result.ToString();
}

static void WriteRational(BigInteger numerator, BigInteger denominator, StringBuilder result)
{
    // precondition: denominator contains only factors of 2 and 5
    var intPart = BigInteger.DivRem(numerator, denominator, out numerator);
    result.Append(intPart.ToString(CultureInfo.InvariantCulture));
    if (numerator.IsZero) { return; }
    result.Append('.');
    do
    {
        numerator *= 10;
        var gcd = BigInteger.GreatestCommonDivisor(numerator, denominator);
        denominator /= gcd;
        intPart = BigInteger.DivRem(numerator / gcd, denominator, out numerator);
        result.Append(intPart.ToString(CultureInfo.InvariantCulture));
    } while (!numerator.IsZero);
}

I've written most of the constants in the code in terms of valueBits and fractionBits (defined in the first lines of the method), in order to make it as straightforward as possible to adapt this method for doubles. To do this:

  • Change valueBits to sizeof(double) * 8
  • Change fractionBits to 52
  • Change all uints to ulongs (including converting 1U to 1UL)
  • Call BitConverter.DoubleToUInt64Bits instead of BitConverter.SingleToUInt32Bits

Making this code culture-aware is left as an exercise for the reader :-)

1

Yeah, this is very fun challenge in C# (or .net). IMHO, most simple solution would be to multiply float/double with some huge number and then convert floating point result to BigInteger. Like, here we try to calculate result of 1e+51*0.1 :

using System.Numerics;
class HelloWorld {
  static void Main() {
    // Ideally, 1e+51*0.1 should be 1 followed by 50 zeros, but =>
    System.Console.WriteLine(new BigInteger(1e+51*0.1));
    // Outputs 100000000000000007629769841091887003294964970946560
  }
}

Because 0.1 in floating point format is represented just approximately, with machine epsilon error. That's why we get this weird result and not 100.... (50 zeros).

Agnius Vasiliauskas
  • 10,935
  • 5
  • 50
  • 70
0

Oops, this answer relates to C, not C#.

Leaving it up as it may provide C# insight as the languages have similarities concerning this.


How do I print the exact value stored in a float?

// Print exact value with a hexadecimal significant.
printf("%a\n", some_float);
// e.g. 0x1.99999ap-4 for 0.1f

To print the value of a float in decimal with sufficient distinctive decimal places from all other float:

int digits_after_the_decimal_point = FLT_DECIMAL_DIG - 1;  // e.g. 9 -1
printf("%.*e\n", digits_after_the_decimal_point, some_float);
// e.g. 1.00000001e-01 for 0.1f

To print the value in decimal with all decimal places places is hard - and rarely needed. Code could use a greater precision. Past a certain point (e.g. 20 significant digits), big_value may lose correctness in the lower digits with printf(). This incorrectness is allowed in C and IEEE 754:

int big_value = 19; // More may be a problem.
printf("%.*e\n", big_value, some_float);
// e.g. 1.0000000149011611938e-01 for 0.1f
// for FLT_TRUE_MIN and big_value = 50, not quite right
// e.g. 1.40129846432481707092372958328991613128026200000000e-45

To print the value in decimal with all decimal places places for all float, write a helper function. Example.

// Using custom code
// -FLT_TRUE_MIN 
-0.00000000000000000000000000000000000000000000140129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

For .NET Framework, use format string G. Not exactly but enough for the float errors.

> (0.3d).ToString("G70")
0.29999999999999999
> (0.1d+0.2d).ToString("G70")
0.30000000000000004

Down voted... Fine, I find dmath, a math library for it.

> new Deveel.Math.BigDecimal(0.3d).ToString()
0.299999999999999988897769753748434595763683319091796875
SE12938683
  • 196
  • 1
  • 7