How do you print the EXACT value of a floating point number?

Question

First of all, this is not a floating point newbie question. I know results of floating point arithmetic (not to mention transcendental functions) usually cannot be represented exactly, and that most terminating decimals cannot be represented exactly as binary floating point numbers.

That said, each possible floating point value corresponds exactly to a diadic rational (a rational number p/q where q is a power of 2), which in turn has an exact decimal representation.

My question is: How do you find this exact decimal representation efficiently? sprintf and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation. I know one algorithm I've used, but it's very slow, O(e^2) where e is the exponent. Here's an outline:

Convert the mantissa to a decimal integer. You can either do this by pulling apart the bits to read the mantissa directly, or you can write a messy floating point loop that first multiplies the value by a power of two to put it in the range 1<=x<10, then pulls off a digit at a time by casting to int, subtracting, and multiplying by 10.
Apply the exponent by repeatedly multiplying or dividing by 2. This is an operation on the string of decimal digits you generated. Every ~3 multiplications will add an extra digit to the left. Every single dividion will add an extra digit to the right.

Is this really the best possible? I doubt it, but I'm not a floating-point expert and I can't find a way to do the base-10 computations on the floating point representation of the number without running into a possibility of inexact results (multiplying or dividing by anything but a power of 2 is a lossy operation on floating point numbers unless you know you have free bits to work with).

In the end, I simply replaced my old base-10 code with base-1e9 and repeated multiplication/division by 2 with mult by 2^29 and div by 2^9 for most of the iterations followed by mult/div by 2 for the tail. The resulting code prints the smallest 80-bit `long double` in fairly negligible time, so I'm happy enough. — R.. GitHub STOP HELPING ICE, Jul 30 '10 at 23:59
Jon Skeet has a [DoubleConverter class](http://yoda.arachsys.com/csharp/DoubleConverter.cs) that can print the exact decimal representations. It's written in C# but you can convert it to C http://stackoverflow.com/questions/4732680/how-to-output-a-double-value-with-all-the-decimal-places-in-debugger-or-console — phuclv, Jun 21 '14 at 11:02

score 38 · Accepted Answer · edited Sep 26 '20 at 06:28

38

This question has a bureaucratic part and an algorithmic part. A floating point number is stored internally as (2^e × m), where e is an exponent (itself in binary) and m is a mantissa. The bureaucratic part of the question is how to access this data, but R. seems more interested in the algorithmic part of the question, namely, converting (2^e × m) to a fraction (a/b) in decimal form. The answer to the bureaucratic question in several languages is frexp (which is an interesting detail that I didn’t know before today).

It is true that at first glance, it takes O(e²) work just to write 2^e in decimal, and more time still for the mantissa. But, thanks to the magic of the Schönhage–Strassen fast multiplication algorithm, you can do it in Õ(e) time, where the tilde means “up to log factors”. If you view Schönhage–Strassen as magic, then it’s not that hard to think of what to do. If e is even, you can recursively compute 2^e/2, and then square it using fast multiplication. On the other hand if e is odd, you can recursively compute 2^e−1 and then double it. You have to be careful to check that there is a version of Schönhage–Strassen in base 10. Although it is not widely documented, it can be done in any base.

Converting a very long mantissa from binary to base 10 is not exactly the same question, but it has a similar answer. You can divide the mantissa into two halves, m = a × 2^k + b. Then recursively convert a and b to base 10, convert 2^k to base 10, and do another fast multiplication to compute m in base 10.

The abstract result behind all of this is that you can convert integers from one base to another in Õ(N) time.

If the question is about standard 64-bit floating point numbers, then it’s too small for the fancy Schönhage–Strassen algorithm. In this range you can instead save work with various tricks. One approach is to store all 2048 values of 2^e in a lookup table, and then work in the mantissa with asymmetric multiplication (in between long multiplication and short multiplication). Another trick is to work in base 10000 (or a higher power of 10, depending on architecture) instead of base 10. But, as R. points out in the comments, 128-bit floating point numbers already allow large enough exponents to call into question both lookup tables and standard long multiplication. As a practical matter, long multiplication is the fastest up to a handful of digits, then in a significant medium range one can use Karatsuba multiplication or Toom–Cook multiplication, and then after that a variation of Schönhage–Strassen is best not just in theory but also in practice.

Actually, the big integer package GMP already has Õ(N)-time radix conversion, as well as good heuristics for which choice of multiplication algorithm. The only difference between their solution and mine is that instead of doing any big arithmetic in base 10, they compute large powers of 10 in base 2. In this solution, they also need fast division, but that can be obtained from fast multiplication in any of several ways.

edited Sep 26 '20 at 06:28

ib.

27,830
11
80
100

answered Jul 09 '10 at 18:37

Greg Kuperberg

3,995
22
24

Thanks for the link and the first answer with any theoretical content! It looks like [Toom-Cook](http://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication) might actually be the preferable algorithm for non-astronomical-exponents. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 19:03
Very interesting. Could you explain how using base 10000 speeds things up? – Steven Sudit Jul 09 '10 at 19:04
Steven: Using base 10000 speeds things up because it's 4 times faster than base 10 due to them both fitting in a machine word. – Gabe Jul 09 '10 at 19:08
Don't get too excited: "As a rule of thumb, Karatsuba is usually faster when the multiplicands are longer than 320-640 bits." -http://en.wikipedia.org/wiki/Karatsuba_algorithm – Steven Sudit Jul 09 '10 at 19:13
@Gabe: That makes sense. Base 10000 can be trivially converted to decimal, but speeds up the intermediate steps by lowering the number of digits. I'm not sure how this compares to the BCD-based approaches, but it's certainly viable. – Steven Sudit Jul 09 '10 at 19:14
R.: Well, the adjective "astronomical" is a bit flat because there are various sizes of numbers for various purposes. It is true that in a medium range you or a library might use Karatsuba or Toom-Cook, but actually both of them are similar to Schonhage-Strassen. – Greg Kuperberg Jul 09 '10 at 19:21
R: Even Toom-Cook and Karatsuba are too big for piddly little 64-bit floats. If you just wanted a faster implementation you would simply use a bigger base than 10 (1e2 for bytes, 1e4 for words, and 1e9 for ints) and do larger multiplies (e.g. do 1/3 as many multiplies by 8 or 1/16 as many multiplies by 65536). – Gabe Jul 09 '10 at 19:23
Steven: See for instance http://stackoverflow.com/questions/1469529/#1875697 . For multiplication you wouldn't just want one digit to fit in a machine word, but also the product of two digits. – Greg Kuperberg Jul 09 '10 at 19:24
2

@Gabe, are you sure? A "64-bit" float involves ~1076-digit (decimal) arithmetic. An "80-bit" float involves ~16448-digit arithmetic. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 19:28
R.: Actually the max 64-bit float in IEEE 754 has 309 decimal digits and a max 128-bit float has 4933 decimal digits. Remember, there will be fewer decimal digits than binary digits. But you're right that that is getting to the range of accelerated multiplciation algorithms. – Greg Kuperberg Jul 09 '10 at 19:44
1

You're thinking of cases where the exponent is positive. If it's negative, every time you decrement the exponent further you get an extra decimal place on the right (holding a '5') but it takes several exponent decrements to clear a decimal place on the left (e.g. 5->2->1->0). I overestimated but it seems you need roughly binary_exp*2/3 decimal digits, so ~700 digits for IEEE 754. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 19:54
Oh, I missed a detail in the question. I thought you wanted a decimal representation of the numerator and denominator of the fraction. For the exact decimal of the fraction itself, I agree with your estimate. – Greg Kuperberg Jul 09 '10 at 19:58
R: Assuming you use 64-bit machine words (base 1e19), each double-precision float will only require 17 (or 37 for negative exponents) words at most to represent, and you could then multiply by 2^64 at a time, so you would only have to do 16 multiplies. Since that's just worst case, most conversions would require very few operations, so Karatsuba or Toom-3 wouldn't give you an advantage. – Gabe Jul 09 '10 at 20:08
I was thinking 32-bit machine words which are still pretty much the norm. I could try making the code use `size_t` or another type that's essentially-guaranteed to be machine-word-sized instead of `uint32_t` with compile-time computations to determine the base that gets used, but it might be a bit of a pain. – R.. GitHub STOP HELPING ICE Jul 10 '10 at 09:08
@Greg: If I understand correctly, the "320-640 bits" does not apply to the "compact" floating point representation here but rather to the full expansion (whether decimal or otherwise). – Steven Sudit Jul 12 '10 at 02:24
Steven: I think that that's right. However, it applies, because the question posed was about converting a floating point binary to a full decimal. Actually Karatsuba in base 10 would become competitive sooner, because the advice about the threshold assumes a fast arithmetic unit in the CPU. – Greg Kuperberg Jul 12 '10 at 02:34
Schönhage-Strassen is vast overkill. The numbers of bits involved are far too small to ever need it. For 80 bit or 128 bit binary floats, you may need to use Karatsuba. In my (fast) BigInteger implementation, it only gets faster than plain base-case multiplication at approx. 3200 bits, and above that, Toom-Cook (e.g. 3-way) at approx. 8200 bits. Schönhage-Strassen is for numbers with vastly more bits. Due to its large overhead, it is ssslllooowww for smaller numbers. I'd rather rely on GMP or similar. They choose the right algorithm depending on the number of bits. – Rudy Velthuis Oct 18 '18 at 22:11

score 18 · Answer 2 · edited Jan 26 '21 at 04:22

18

I see you’ve accepted an answer already, but here are a couple of open source implementations of this conversion you might want to look at:

David Gay’s dtoa() function in dtoa.c: https://www.netlib.org/fp/dtoa.c.
The function ___printf_fp() in the /stdio-common/printf_fp.c file in Glibc (https://ftp.gnu.org/gnu/glibc/glibc-2.11.2.tar.gz, for example).

Both will print as many digits as you ask for in a %f-type printf, as I’ve written about at:

edited Jan 26 '21 at 04:22

ib.

27,830
11
80
100

answered Jul 10 '10 at 17:41

Rick Regan

3,407
22
28

Great answer! This is the type of thing I was looking for. I'll check out those sources. – R.. GitHub STOP HELPING ICE Jul 10 '10 at 18:26
Your blog is fantastic. I'd seen a few posts on it earlier, but wasn't aware that the author exists here too :) – devnull Jun 27 '14 at 13:13
ISTM that David M. gay's implementation is a de facto (but not official) standard implementation. Several languages like have adopted it to their needs as well. I am actually trying to make the Delphi and C++Builder people at Embarcadero adopt it too. -- Oh wait, you are the guy from Exploring Binary? Good job! Love your site. – Rudy Velthuis Oct 18 '18 at 22:19

score 7 · Answer 3 · answered Jul 09 '10 at 19:52

7

There's been a lot of work on printing floating-point numbers. The gold standard is to print out a decimal equivalent of minimal length such that when the decimal equivalent is read back in, you get the same floating-point number that you started with, no matter what the rounding mode is during readback. You can read about the algorithm in the excellent paper by Burger and Dybvig.

answered Jul 09 '10 at 19:52

Norman Ramsey

198,648
61
360
533

2

That's a well-researched problem that's in some ways simpler and in some ways more difficult, but regardless it's a different problem. Thanks for the link though. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 20:40
@R: Oops. I failed to understand the question. Perhaps an example would have helped. – Norman Ramsey Jul 09 '10 at 22:51

Gabe · Answer 4 · 2010-07-09T18:17:56.900

3

Although it's C# and your question is tagged with C, Jon Skeet has code to convert a double to its exact representation as a string: http://www.yoda.arachsys.com/csharp/DoubleConverter.cs

From a quick glance, it does not appear to be too hard to port to C, and even easier to write in C++.

Upon further reflection, it appears that Jon's algorithm is also O(e^2), as it also loops over the exponent. However, that means the algorithm is O(log(n)^2) (where n is the floating-point number), and I'm not sure you can convert from base 2 to base 10 in better than log-squared time.

edited Jul 09 '10 at 18:17

answered Jul 09 '10 at 18:08

Gabe

84,912
12
139
238

Interesting. Looks like he took that BCD approach, or close to it. – Steven Sudit Jul 09 '10 at 18:14
That is the same method he mentioned in the question. – Larry Wang Jul 09 '10 at 18:15
@Kaestur: Yes, but the code shows how to handle the fringe cases, such as subnormals. It's worth looking at. – Steven Sudit Jul 09 '10 at 18:23
If you're considering the theoretical big-O (and bignum stuff), then conversion from base 2 to base 10 probably can't be done in less than log-squared time. But if your numbers fit in machine words, it's log time, which is much better. The question is whether you can do the same thing for floating point numbers using the machine's floating point arithmetic. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 18:27
My implementation used the ugly loop (rather than bit fiddling) to extract the mantissa, so it didn't care if the floating point value was subnormal to begin with. `for (e=0; x<1; x*=2, e--);` brought it into normal range in a few iterations. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 18:31

score 2 · Answer 5 · answered Jul 09 '10 at 17:55

2

Well being no floating point expert myself, I'd defer to using a well tested open source library.

The GNU MPFR is a good one.

The MPFR library is a C library for multiple-precision floating-point computations with correct rounding. The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics.

answered Jul 09 '10 at 17:55

Byron Whitlock

52,691
28
123
168

1

And it does support conversion from double to arbitrary decimal. – Steven Sudit Jul 09 '10 at 18:15

score 1 · Answer 6 · answered Jul 13 '10 at 06:34

1

sprintf and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation.

You can ask for more significant digits than the default:

printf("%.100g\n", 0.1);

prints 0.1000000000000000055511151231257827021181583404541015625.

answered Jul 13 '10 at 06:34

dan04

87,747
23
163
198

2

Your system's printf happens to do the polite (but not specified by any standard) thing and compute as many digits as requested. Most just chop everything off after computing sufficiently many digits to uniquely determine the float. See the links in Rick Regan's answer. – R.. GitHub STOP HELPING ICE Jul 13 '10 at 06:42
this works in gcc(gnu compiler collection) and tcc(tiny c compiler) – barlop Nov 13 '15 at 18:36
1

@barlop whether this works or not depends on the implementation of the standard library (e.g. glibc), and not the compiler. – kikones34 Sep 23 '21 at 12:58
@kikones34 though I suppose that a particular compiler uses particular implementations of standard libraries. So it does depend on the compiler because the compiler depends on whatever implementations of standard libraries it uses. – barlop Oct 01 '21 at 07:29

Spektre · Answer 7 · 2016-03-25T08:40:06.027

There are 3 ways

printing numbers in bin or hex

This is the most precise way. I prefer hex because it is more like base 10 for reading/feel like for example F.8h = 15.5 no precision loss here.

printing in dec but only the relevant digits

With this I mean only digits which can have 1 in your number represented as close as possible.

num of integer digits are easy and precise (no precision loss):

// n10 - base 10 integer digits
// n2  - base 2 integer digits
n10=log10(2^n2)
n10=log2(2^n2)/log2(10)
n10=n2/log2(10)
n10=ceil(n2*0.30102999566398119521373889472449)
// if fist digit is 0 and n10 > 1 then n10--

num of fractional digits are more tricky and empirically I found this:

// n10 - base 10 fract. digits
// n2  - base 2 fract. digits >= 8
n10=0; if (n02==8) n10=1;
else if (n02==9) n10=2;
else if (n02> 9)
    {
    n10=((n02-9)%10);
         if (n10>=6) n10=2;
    else if (n10>=1) n10=1;
    n10+=2+(((n02-9)/10)*3);
    }

if you make a dependency table n02 <-> n10 then you see that constant 0.30102999566398119521373889472449 is still present, but at start from 8 bits because less cannot represent 0.1 with satisfactory precision (0.85 - 1.15). because of negative exponents of base 2 the dependency is not linear, instead it patterns. This code works for small bit count (<=52) but at larger bit counts there can be error because used pattern do not fit log10(2) or 1/log2(10) exactly.

for larger bit counts I use this:

n10=7.810+(9.6366363636363636363636*((n02>>5)-1.0));

but that formula is 32bit aligned !!! and also bigger bit count ads error to it

P.S. further analysis of binary representation of decadic numbers

0.1
0.01
0.001
0.0001
...

may reveal the exact pattern repetition which would lead to exact number of relevant digits for any bit count.

for clarity:

8 bin digits -> 1 dec digits
9 bin digits -> 2 dec digits
10 bin digits -> 3 dec digits
11 bin digits -> 3 dec digits
12 bin digits -> 3 dec digits
13 bin digits -> 3 dec digits
14 bin digits -> 3 dec digits
15 bin digits -> 4 dec digits
16 bin digits -> 4 dec digits
17 bin digits -> 4 dec digits
18 bin digits -> 4 dec digits
19 bin digits -> 5 dec digits
20 bin digits -> 6 dec digits
21 bin digits -> 6 dec digits
22 bin digits -> 6 dec digits
23 bin digits -> 6 dec digits
24 bin digits -> 6 dec digits
25 bin digits -> 7 dec digits
26 bin digits -> 7 dec digits
27 bin digits -> 7 dec digits
28 bin digits -> 7 dec digits
29 bin digits -> 8 dec digits
30 bin digits -> 9 dec digits
31 bin digits -> 9 dec digits
32 bin digits -> 9 dec digits
33 bin digits -> 9 dec digits
34 bin digits -> 9 dec digits
35 bin digits -> 10 dec digits
36 bin digits -> 10 dec digits
37 bin digits -> 10 dec digits
38 bin digits -> 10 dec digits
39 bin digits -> 11 dec digits
40 bin digits -> 12 dec digits
41 bin digits -> 12 dec digits
42 bin digits -> 12 dec digits
43 bin digits -> 12 dec digits
44 bin digits -> 12 dec digits
45 bin digits -> 13 dec digits
46 bin digits -> 13 dec digits
47 bin digits -> 13 dec digits
48 bin digits -> 13 dec digits
49 bin digits -> 14 dec digits
50 bin digits -> 15 dec digits
51 bin digits -> 15 dec digits
52 bin digits -> 15 dec digits
53 bin digits -> 15 dec digits
54 bin digits -> 15 dec digits
55 bin digits -> 16 dec digits
56 bin digits -> 16 dec digits
57 bin digits -> 16 dec digits
58 bin digits -> 16 dec digits
59 bin digits -> 17 dec digits
60 bin digits -> 18 dec digits
61 bin digits -> 18 dec digits
62 bin digits -> 18 dec digits
63 bin digits -> 18 dec digits
64 bin digits -> 18 dec digits

And at last do not forget to round the cut off digits !!! That means if digit after the last relevant digit is >=5 in dec than last relevant digit should be +1 ... and if it is already 9 than you must go to previous digit and so on...

print exact value

To print exact value of fractional binary number just print fractional n digits where n is the number of fractional bits because the value represented is sum of this values so the number of fractional decimals cannot be bigger than the num of fractional digits of LSB. Stuff above (bullet #2) is relevant for storing decimal number to float (or printing just relevant decimals).

negative powers of two exact values...

2^- 1 = 0.5
2^- 2 = 0.25
2^- 3 = 0.125
2^- 4 = 0.0625
2^- 5 = 0.03125
2^- 6 = 0.015625
2^- 7 = 0.0078125
2^- 8 = 0.00390625
2^- 9 = 0.001953125
2^-10 = 0.0009765625
2^-11 = 0.00048828125
2^-12 = 0.000244140625
2^-13 = 0.0001220703125
2^-14 = 0.00006103515625
2^-15 = 0.000030517578125
2^-16 = 0.0000152587890625
2^-17 = 0.00000762939453125
2^-18 = 0.000003814697265625
2^-19 = 0.0000019073486328125
2^-20 = 0.00000095367431640625

now negative powers of 10 printed by exact value style for 64 bit doubles:

10^+ -1 = 0.1000000000000000055511151231257827021181583404541015625
        = 0.0001100110011001100110011001100110011001100110011001101b
10^+ -2 = 0.01000000000000000020816681711721685132943093776702880859375
        = 0.00000010100011110101110000101000111101011100001010001111011b
10^+ -3 = 0.001000000000000000020816681711721685132943093776702880859375
        = 0.000000000100000110001001001101110100101111000110101001111111b
10^+ -4 = 0.000100000000000000004792173602385929598312941379845142364501953125
        = 0.000000000000011010001101101110001011101011000111000100001100101101b
10^+ -5 = 0.000010000000000000000818030539140313095458623138256371021270751953125
        = 0.000000000000000010100111110001011010110001000111000110110100011110001b
10^+ -6 = 0.000000999999999999999954748111825886258685613938723690807819366455078125
        = 0.000000000000000000010000110001101111011110100000101101011110110110001101b
10^+ -7 = 0.0000000999999999999999954748111825886258685613938723690807819366455078125
        = 0.0000000000000000000000011010110101111111001010011010101111001010111101001b
10^+ -8 = 0.000000010000000000000000209225608301284726753266340892878361046314239501953125
        = 0.000000000000000000000000001010101111001100011101110001000110000100011000011101b
10^+ -9 = 0.0000000010000000000000000622815914577798564188970686927859787829220294952392578125
        = 0.0000000000000000000000000000010001001011100000101111101000001001101101011010010101b
10^+-10 = 0.00000000010000000000000000364321973154977415791655470655996396089904010295867919921875
        = 0.00000000000000000000000000000000011011011111001101111111011001110101111011110110111011b
10^+-11 = 0.00000000000999999999999999939496969281939810930172340963650867706746794283390045166015625
        = 0.00000000000000000000000000000000000010101111111010111111111100001011110010110010010010101b
10^+-12 = 0.00000000000099999999999999997988664762925561536725284350612952266601496376097202301025390625
        = 0.00000000000000000000000000000000000000010001100101111001100110000001001011011110101000010001b
10^+-13 = 0.00000000000010000000000000000303737455634003709136034716842278413651001756079494953155517578125
        = 0.00000000000000000000000000000000000000000001110000100101110000100110100001001001011101101000001b
10^+-14 = 0.000000000000009999999999999999988193093545598986971343290729163921781719182035885751247406005859375
        = 0.000000000000000000000000000000000000000000000010110100001001001101110000110101000010010101110011011b
10^+-15 = 0.00000000000000100000000000000007770539987666107923830718560119501514549256171449087560176849365234375
        = 0.00000000000000000000000000000000000000000000000001001000000011101011111001111011100111010101100001011b
10^+-16 = 0.00000000000000009999999999999999790977867240346035618411149408467364363417573258630000054836273193359375
        = 0.00000000000000000000000000000000000000000000000000000111001101001010110010100101111101100010001001101111b
10^+-17 = 0.0000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
        = 0.0000000000000000000000000000000000000000000000000000000010111000011101111010101000110010001101101010010010111b
10^+-18 = 0.00000000000000000100000000000000007154242405462192450852805618492324772617063644020163337700068950653076171875
        = 0.00000000000000000000000000000000000000000000000000000000000100100111001001011101110100011101001001000011101011b
10^+-19 = 0.000000000000000000099999999999999997524592683526013185572915905567688179926555402943222361500374972820281982421875
        = 0.000000000000000000000000000000000000000000000000000000000000000111011000001111001001010011111011011011010010101011b
10^+-20 = 0.00000000000000000000999999999999999945153271454209571651729503702787392447107715776066783064379706047475337982177734375
        = 0.00000000000000000000000000000000000000000000000000000000000000000010111100111001010000100001100100100100100001000100011b

now negative powers of 10 printed by relevant decimal digits only style (i am more used to this) for 64bit doubles:

10^+ -1 = 0.1
10^+ -2 = 0.01
10^+ -3 = 0.001
10^+ -4 = 0.0001
10^+ -5 = 0.00001
10^+ -6 = 0.000001
10^+ -7 = 0.0000001
10^+ -8 = 0.00000001
10^+ -9 = 0.000000001
10^+-10 = 0.0000000001
10^+-11 = 0.00000000001
10^+-12 = 0.000000000001
10^+-13 = 0.0000000000001
10^+-14 = 0.00000000000001
10^+-15 = 0.000000000000001
10^+-16 = 0.0000000000000001
10^+-17 = 0.00000000000000001
10^+-18 = 0.000000000000000001
10^+-19 = 0.0000000000000000001
10^+-20 = 0.00000000000000000001

hope it helps :)

This answer is is very interesting (so please don't delete it, it might be helpful to somebody who comes along with a slightly different problem) but it doesn't answer this question. This question is about printing the exact value, not printing sufficiently many digits to recover the original value by rounding. — R.. GitHub STOP HELPING ICE, Aug 23 '13 at 12:34
binary fractions cannot be converted to decimal fractions without precision loss (in finite digits count) so if you want to print exact value than the point 1. is relevant only (print numbers in hex/bin or any base decomposable by power of 2). i wass thinking you want to print exact decimal value storable in floating point (in given mantisa precision) and not exact floating point value stored in floating point as decimal number. sorry ... still point 1 answers your question (you did not specify decadic system) for example 1.6A09E667F3BCC908B2FB1366h is sqrt(2) in hex — Spektre, Aug 23 '13 at 14:26
Yes they can. For example, a binary fraction of 0.01 is decimal 0.25, and a binary fraction of 0.001 is decimal 0.125. In general, the number of decimal places to the right of the decimal point is equal to the number of binary places to the right of the binary point. — R.. GitHub STOP HELPING ICE, Aug 23 '13 at 14:53
Silly me ... i was thinking it backwards again :) that came for transform base10 -> base2 ... in print it is base2 -> base10 thats easy num of decimal digits is exactly the same as num of fractional digits see my answer ... for edit — Spektre, Aug 23 '13 at 15:24
btw i forgot to tell: to eliminate precission loss during conversion bin -> dec i create hex string (simple shift+and of mantisa in a loop) and then i convert this hex string to dec string (then reformat and print). my conversion code is here (no use of bignums or FPU) http://stackoverflow.com/a/18231860/2521214 — Spektre, Aug 25 '13 at 10:52
@R..GitHubSTOPHELPINGICE this [32 bit float print on integer math only](https://stackoverflow.com/a/59861545/2521214) might interest you — Spektre, May 05 '20 at 10:19

score 0 · Answer 8 · answered Jul 09 '10 at 17:55

0

If you want more exact results, why not use fixed point math instead? Conversions are quick. Error is known and can be worked around. Not an exact answer to your question, but a different idea for you.

answered Jul 09 '10 at 17:55

Michael Dorgan

12,453
3
31
61

1

Wouldn't be a bad idea if I were using this in a specific application, but the problem domain is specifically solving this (rather painful) floating point to exact decimal conversion. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 18:03

score 0 · Answer 9 · answered Jul 09 '10 at 18:03

0

Off the top of my head, why not break the exponent down into a sum of binary exponents first, then all your operations are loss-less.

I.e.

10^2 = 2^6 + 2^5 + 2^2

Then sum:

mantissa<<6 + mantissa<<5 + mantissa<<2

I'm thinking that breaking it down would be on the O(n) on the the number of digits, the shifting is O(1), and the summing is O(n) digits...

You would have to have an integer class big enough to store the results, of course...

Let me know - I'm curious about this, it really made me think. :-)

answered Jul 09 '10 at 18:03

eruciform

7,680
1
35
47

The exponent is a binary exponent to begin with. And there's definitely no integer type (without some sort of bigint) capable of storing the result. It can be over 1000 digits with a double, and over 16000 digits with a long double. :-) – R.. GitHub STOP HELPING ICE Jul 09 '10 at 18:14
@r: i guess you could calloc(1000) and then bit-copy things in the right place. but definitely messy. floating point is there for a reason. :-) – eruciform Jul 09 '10 at 18:31
this can work only for integer part of number and there are much faster easier and nicer ways for it ... look at my answer for log2(10) which is pretty constant ... so if you want num of dec integer digits than n(base10) = n(base2)/log2(10). the problem is that this question is all about fractional part which cant be decomposed to powers of 2 ... at least i havent a clue how 10^-n = 2^-a+2^-b+2^-c+... the only way is to round it to the closest match within given accuracy – Spektre Aug 23 '13 at 11:03

score -2 · Answer 10 · answered Jul 09 '10 at 17:54

-2

You don't. The closest you can come to that is dumping the bytes.

answered Jul 09 '10 at 17:54

Steven Sudit

19,391
1
51
53

1

I've thought about this some more, and I think I'm wrong. Since base 10 goes into base 2, there shouldn't be any binary values that can only be represented in decimal if we allow repeating digits. As a result, you should in principle be able to convert a float/double to a (potentially very long) string of decimal digits. – Steven Sudit Jul 09 '10 at 17:59
2

Of course you can. I have an implementation that does it in `O(e^2)` time (which hopefully can be improved) and `O(e)` space (which the decimal representation necessarily requires) as I described. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 18:01
To finish answering, yes, the algorithm you described looks like it would work, but an arbitrary precision library (like the one Byron recommended) would make things easy. For something related but I think different, there's also: http://keithbriggs.info/xrc.html – Steven Sudit Jul 09 '10 at 18:02
I suspect that implementing the multiplications as shifts will speed things up, but that doesn't necessarily improve the big O. – Steven Sudit Jul 09 '10 at 18:04
I think what i just wrote is wrong, because I missed out on the fact that the doubling is happening to the decimal value. Perhaps the way to handle this is to keep the output in a format like BCD until you're done. – Steven Sudit Jul 09 '10 at 18:07
You can definitely store several decimal digits in each array position; in fact it should work to use 32bit integers to store values up to 1 billion as long as you handle the carry/remainder right. The big-O doesn't improve but the constant is slashed by a factor of 9. At the end, normal integer printing of each 32bit cell can be performed. – R.. GitHub STOP HELPING ICE Jul 09 '10 at 18:21
Skeet's code takes the compromise of 1 digit per byte, as opposed to the 2 that BCD would bring. While he loses compactness, he significantly simplifies the implementation of operations such as shifting and multiplying, so it might be worth it. It's better than the usual BigNumber solution of using a character for each, because you avoid having to mask and unmask the 48. – Steven Sudit Jul 09 '10 at 18:26
I was reading http://en.wikipedia.org/wiki/Binary-coded_decimal, which gives the algorithm for adding BCD values. You could perform a doubling simply by adding the value to itself. Essentially, you do a binary add and then adjust it back to BCD. This could be faster than digit-by-digit adding. – Steven Sudit Jul 09 '10 at 18:34
This is a link to an explanation of the algorithm used by the old ASM instruction, DAA: http://faydoc.tripod.com/cpu/daa.htm. At first glance, it looks like it could be scaled up, with a table implementation that implements carry by returning values twice as wide as the input. – Steven Sudit Jul 09 '10 at 18:45
I think I'm out of ideas, although I've done enough to repent for my original mistake. I do want to point out that, even though this prints the precise decimal equivalent of the float, the float itself may not have been initialized with precise equivalents of the decimal, and even if it was, then operations on it may well have generated repeating digits and other artifacts. If you really want a precise answer, something like xrc is the way to go. Good luck. – Steven Sudit Jul 09 '10 at 19:01
This answer was downvoted sometime after it was originally written as part of a pattern of arbitrary downvotes. – Steven Sudit Jul 22 '10 at 19:28

How do you print the EXACT value of a floating point number?

10 Answers10

Linked