0

I say "accurate" because IEEE-754 doesn't accurately represent decimal numbers, which seems to be the crux of the matter.

All decimal numbers can be represented in scientific notation, as illustrated below. I have included a column for the unscaled value, which is the decimal value with all significant digits shifted to the left of the decimal point; for example:

General Scientific Unscaled
0.01 1.01e+0 101
0.025 2.5e-2 25
123 1.23e+2 123
12345678 1.2345678e+7 12345678
123.456 1.23456e+2 123456
12345.678901234567 1.2345678901234567e+4 12345678901234567
System.Double.MaxValue 1.7976931348623157e+308 17976931348623157 * (10 ^ 308)

The following code utilises .NET 7.0 generic math to obtain the exponent and mantissa from the IEEE-754 value, respectively. Note that they obtain the decimal (base 10) exponent and mantissa, not the binary (base 2) representation of them:

GetExponent

private static T GetExponent<T>(T value) where T : IBinaryFloatingPointIeee754<T>
{
    if (T.IsNaN(value) || T.IsInfinity(value) || T.IsZero(value)) return T.Zero;

    T absValue = T.Abs(value);
    T log10 = T.Log10(absValue);
    return T.Floor(log10);
}

GetMantissa

private static T GetMantissa<T>(T value) where T : IBinaryFloatingPointIeee754<T>
{
    T ten = T.CreateChecked(10);
    T exponent = GetExponent(value);
    T factor = T.Pow(ten, exponent);
    return value / factor;
}

Given the table above, these functions produce the following values:

Scientific Mantissa Exponent
1.01e+0 1.01 0
2.5e-2 2.5 -2
1.23e+2 1.23 2
1.2345678e+7 1.2345678 7
1.23456e+2 1.23456 2
1.2345678901234567e+4 1.2345678901234567 4
1.7976931348623157e+308 1.7976931348623157 308

So far, so good! Now I want to multiply the mantissa by 10, until all significant digits are to the left of the decimal point. The following function obtains the unscaled mantissa:

private static BigInteger GetUnscaledMantissa<T>(T value) where T : IBinaryFloatingPointIeee754<T>
{
    T ten = T.CreateChecked(10);
    T mantissa = GetMantissa(value);
    T factor = T.One;

    // While the remainder isn't zero...
    while (mantissa * factor % T.One != T.Zero)
    {
        // ..multiply factor by 10.
        factor *= ten;
    }
    
    BigInteger result = BigInteger.CreateChecked(mantissa * factor);

    // trim any trailing zeros, which sometimes occurs.
    while (result % 10 == 0) result /= 10;

    return result;
}

Let's take a look at the results:

Mantissa Unscaled
1.01 101
2.5 25
1.23 123
1.2345678 12345678
1.23456 123456
1.2345678901234567 12345678901234566
1.7976931348623157 17976931348623158

Generally speaking, the GetUnscaledMantissa function returns the correct value, however, notice the outliers highlighted in bold; they are not quite correct. It seems that in some cases the value is rounded up or down.

Question

Whilst I understand that this is just the nature of IEEE-754 binary floating point numbers, is there a way, or how could I modify the GetUnscaledMantissa function, so that it accurately returns the unscaled mantissa in all, or at least more/most cases?

(Note, I know that this is possible, it's just not trivial)

Update

Given the extended conversation on this topic, it seems that there is some confusion as to what I am trying to achieve, so hopefully the following goes some way to setting the record straight.

Forget IEEE-754 for the time being! Let's just focus on some pure maths.

The following numbers are expressed in scientific notation, and their equivalent full form:

Table A

Scientific Full
1.00001e+50 100001000000000000000000000000000000000000000000000
-3.345233391e+45 -3345233391000000000000000000000000000000000000
1.7976931348623157E+308 179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

These values in scientific notation can be verified in WolframAlpha; the maths checks out!

Now, the confusion seems to be around what happens when you represent the same numbers with IEEE-754.

The following numbers are expressed in scientific notation, and their equivalent full form, except this time, the full number is based on the IEEE-754 calculation:

Table B

Scientific Full
1.00001e+50 100000999999999993488106414884841393099338118332416
-3.345233391e+45 -3345233391000000093465768949128568879123005440
1.7976931348623157E+308 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

What I want GetUnscaledMantissa to return is the mathematically correct unscaled number (in Table A), not the IEEE-754 "correct" unscaled number (in Table B).

Matthew Layton
  • 39,871
  • 52
  • 185
  • 313
  • Can't you just parse `number.ToString("E")` – Klaus Gütter May 08 '23 at 09:11
  • @KlausGütter, I could, but I'd like to avoid string parsing, if at all possible. – Matthew Layton May 08 '23 at 09:17
  • 2
    A double has 15 ~ 16 significant digits, `12345.678901234567` is already inaccurate. – shingo May 08 '23 at 09:47
  • @shingo Doesn't seem to be an issue: `Console.WriteLine(12345.678901234567);` prints the number exactly as specified, and my `GetMantissa` function also obtains it as `1.2345678901234567`. – Matthew Layton May 08 '23 at 09:51
  • The decimal number you see is just converted (sum up) from the binary number, `1.2345678901234567` and `12.345678901234567` are totally different in binary represention. You can see the last `7` because you are luck, you can try `Console.WriteLine(12345678901234567.0);`, the last digit becomes `8`. – shingo May 08 '23 at 10:03
  • @shingo The last digit becomes 8 when it's printed, but `GetMantissa(12345678901234567.0)` returns `1.2345678901234567` which is correct. Also `GetExponent(12345678901234567.0)` returns `16`, which aligns with your earlier comment. – Matthew Layton May 08 '23 at 10:21
  • in fact your mantisa is already not accurate since it should be `1.79769313486231570814527423732e+308` (prolly this is still not accurate) for max double you lost precision doing math operation to get it – Selvin May 08 '23 at 10:25
  • As decimal numbers cannot be represented exactly in IEEE in general, you may have to be a bit more specific in your requirements. E.g. given a number x represented in IEEE-754, find a pair of integers (m, e) such that m*10^e is (1) represented by the same IEEE as x and (2) among all such pairs, m has the least number of digits. – Klaus Gütter May 08 '23 at 10:28
  • @Selvin The mantissa is accurate in that the question states that it obtains the mantissa and exponent in base 10, **not** in base 2. I specifically don't want the binary representations because from a mathematical perspective, they do not produce the correct value; for example; `double.MaxValue`, which is 1.7976931348623157E+308 produces a 309 digit long, seemingly random sequence, but 1.7976931348623157E+308 in maths is 17976931348623157 with 292 trailing zeros. – Matthew Layton May 08 '23 at 10:59
  • @Selvin See my related question: https://stackoverflow.com/questions/76143839/understanding-ieee-754-64-bit-fixed-point-representation-in-c-sharp-and-java – Matthew Layton May 08 '23 at 11:02
  • `GetMantissa(12345678901234566.0), GetMantissa(12345678901234567.0), GetMantissa(12345678901234568.0)` return the same result. – shingo May 08 '23 at 11:04
  • 1
    `T.Pow(ten, exponent)` and `value / factor` already makes it not accurate – Selvin May 08 '23 at 11:06
  • @Selvin in what way? i.e. the table of results from `GetExponent` and `GetMantissa` are producing the values I want. – Matthew Layton May 08 '23 at 11:07
  • `value` is `T` and `factor` is `T` where T is IEEE 754 number and dividing IEEE 754 numbers already adding inaccuracy – Selvin May 08 '23 at 11:10
  • What I want to express is that the number of digits outside the range is not credible. If you print `1.2345678901234566` in C#, you will see `1.2345678901234567`, what's the unscaled number of `1.2345678901234566`? This may be a false proposition because you cannot even represent `1.2345678901234566` in the code. – shingo May 08 '23 at 11:24
  • @shingo hence the reason I stated "or at least more/most cases". Java's `BigDecimal` also suffers some edge cases in this respect: https://pl.kotl.in/Ahk0HeQIu FYI I deleted my last comment with the code example, because the example only accounted for one of the provided numbers, instead of all three. – Matthew Layton May 08 '23 at 11:30
  • So the case is the unscaled number can keep up to 15 digits, `GetMantissa (1.2345678901234567)=123456789012345`. Is this acceptable? The example you shown is wrong, it's not some edge cases of `BigDecimal`, but it's still the problem of double (as same as the one in this question), because the numbers you passed to `BigDecimal.valueOf` is already inaccurate. https://pl.kotl.in/RWdgJo1hF if I change the parameters to string, the `BigDecimal` works. – shingo May 08 '23 at 11:48

0 Answers0