printing the integral part of a floating point number

Question

I am trying to figure out how to print floating point numbers without using library functions. Printing the decimal part of a floating point number turned out to be quite easy. Printing the integral part is harder:

static const int base = 2;
static const char hex[] = "0123456789abcdef";

void print_integral_part(float value)
{
    assert(value >= 0);
    char a[129]; // worst case is 128 digits for base 2 plus NUL
    char * p = a + 128;
    *p = 0;
    do
    {
        int digit = fmod(value, base);
        value /= base;
        assert(p > a);
        *--p = hex[digit];
    } while (value >= 1);
    printf("%s", p);
}

Printing the integral part of FLT_MAX works flawlessly with base 2 and base 16:

11111111111111111111111100000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000 (base 2)

ffffff00000000000000000000000000 (base 16)

However, printing in base 10 results in errors after the first 7 digits:

340282368002860660002286082464244022240 (my own function)
340282346638528859811704183484516925440 (printf)

I assume this is a result of the division by 10. It gets better if I use double instead of float:

340282346638528986604286022844204804240 (my own function)
340282346638528859811704183484516925440 (printf)

(If you don't believe printf, enter 2^128-2^104 into Wolfram Alpha. It is correct.)

Now, how does printf manage to print the correct result? Does it use some bigint facilities internally? Or is there some floating point trick I am missing?

Why don't you take a look at some `printf()` implementation? — wilx, Jun 06 '12 at 10:26
Printing floating point values accurately and efficiently is hard, except in bases that are powers of 2 (for binary floating point, bases that are the product of a power of 2 and a power of 5 for decimal floating point representations). In base 10, it becomes moderately easy if you use big integers (`m*2^(-p) == (m*5^p)*10^(-p)`). Without using big integers, you can first get a binary representation and convert that to decimal with a [double dabble](http://en.wikipedia.org/wiki/Double_dabble) for the integer part and something similar for the fractional part. But that's not very efficient. — Daniel Fischer, Jun 06 '12 at 14:32
You might find my articles on this subject interesting -- start here: http://www.exploringbinary.com/quick-and-dirty-floating-point-to-decimal-conversion/ — Rick Regan, Jun 06 '12 at 14:43
@Rick Regan : It exactly prints 340282368002860660002286082464244022240 as in the question. — PermanentGuest, Jun 06 '12 at 15:18
@Unni: I haven't tried it, but it's no surprise; it's pretty much the same algorithm. — Rick Regan, Jun 06 '12 at 18:33
divide by 10 will cause incorrect result as binary floating point cannot represent decimal exactly like Agent_L's answer below. Multiply by 10 is better. You should look at the question [here](http://stackoverflow.com/q/17632150/995714) — phuclv, Oct 03 '13 at 02:46

score 2 · Answer 1 · edited Apr 13 '21 at 16:44

2

I believe the problem lies in value /= base;. Do not forget that 10 is not a finite fraction in binary system and thus this calculation is never correct. I also assume some error will occur in fmod due to the same reason.

printf will first compute the integral part and then convert it to decimal (if I get the way you printf the integral part correctly).

edited Apr 13 '21 at 16:44

phuclv

37,963
15
156
475

answered Jun 06 '12 at 10:34

Ivaylo Strandjev

69,226
18
123
176

I think this is partially correct. Division by 10 will give the correct answer if the original number itself is divisible by 10. However, this is unlikely to be the case. Also, there are only 254 bits of precision which is about 7 decimal digits, so after seven divisions by 10, all bets are off. – JeremyP Jun 06 '12 at 11:05
@JeremyP I think that if you write "double p = 10.0; and then compute p /= 10.0; p will be close to 1.0 but not precisely(unless the compiler does some optimization). – Ivaylo Strandjev Jun 06 '12 at 11:07
@JeremyP 254 bits of precision must be a typo, right? float has 24 bits of precision. – fredoverflow Jun 06 '12 at 11:59
@FredOverflow Ha ha, yes 254 bits of precision can't really be described as "only". Unfortunately, I can't fix it now. – JeremyP Jun 06 '12 at 12:55
@izomorphius: 10.0 or, say, 100.0 are exactly representable in floating point, and so dividing by 10.0 will give an exact result of 1.0 or 10.0 respectively. Jeremy: if I want to correct a comment, I simply copy its text, delete the comment, write a new one and correct the errors. – Rudy Velthuis Jun 06 '12 at 14:00
1

@RudyVelthuis yes 10.0 is exactly representable but 3/10 is not. Unless FLT_MAX is divisable by 5 the result of the division FLT_MAX/10 will not be exactly representable. – Ivaylo Strandjev Jun 07 '12 at 08:03
@izomorphius: You said that 10.0/10.0 will be close to 1.0. No, it will be **exactly** 1.0. That is all I tried to rectify. FWIW, the fact that you can display a value with many digits does not mean that these digits are significant. The 24 bits of a single only allow for a significance of 10log(2)*24 decimal digits, so ~7 digits. The rest is not significant. – Rudy Velthuis Jun 07 '12 at 11:15

score 2 · Answer 2 · edited May 23 '17 at 12:10

/Edit: Read Unni's answer first. This results come from http://codepad.org/TLqQzLO3.

void print_integral_part(float value)
{
    printf("input : %f\n", value);
    char a[129]; // worst case is 128 digits for base 2 plus NUL
    char * p = a + 128;
    *p = 0;
    do
    {
        int digit = fmod(value, base);
        value /= base;
        printf("interm: %f\n", value);
        *--p = hex[digit];
    } while (value >= 1);
    printf("result: %s\n", p);
}

print_integral_part(3.40282347e+38F);

to see how messed up your value gets by the value /= base operation:

input : 340282346638528859811704183484516925440.000000
interm: 34028234663852885981170418348451692544.000000
interm: 3402823466385288480057879763104038912.000000
interm: 340282359315034876851393457419190272.000000
interm: 34028234346940236846450271659753472.000000
interm: 3402823335658820218996583884128256.000000
interm: 340282327376181848531187106054144.000000
interm: 34028232737618183051678859657216.000000
interm: 3402823225404785588136713388032.000000
interm: 340282334629736780292710989824.000000
interm: 34028231951816403862828351488.000000
interm: 3402823242405304929106264064.000000
interm: 340282336046446683592065024.000000
interm: 34028232866774907300610048.000000
interm: 3402823378911210969759744.000000
interm: 340282332126513595416576.000000
interm: 34028233212651357863936.000000
interm: 3402823276229139890176.000000
interm: 340282333252413489152.000000
interm: 34028234732616232960.000000
interm: 3402823561222553600.000000
interm: 340282356122255360.000000
interm: 34028235612225536.000000
interm: 3402823561222553.500000
interm: 340282366859673.625000
interm: 34028237357056.000000
interm: 3402823735705.600098
interm: 340282363084.799988
interm: 34028237619.200001
interm: 3402823680.000000
interm: 340282368.000000
interm: 34028236.800000
interm: 3402823.600000
interm: 340282.350000
interm: 34028.234375
interm: 3402.823438
interm: 340.282349
interm: 34.028235
interm: 3.402824
interm: 0.340282
result: 340282368002860660002286082464244022240

When in doubt, throw more printfs at it ;)

score 2 · Answer 3 · answered Jun 06 '12 at 11:17

2

According to IEEE single precision float implementation, only 24 bits of data is stored at any time in a float variable. This means only maximum 7 decimal digits are stored in the floating number.

Rest of the hugeness of the number is stored in the exponent. FLT_MAX is initialized as 3.402823466e+38F. So, after the 10th precision, which digit should get printed is not defined anywhere.

From Visual C++ 2010 compiler, I get this output 340282346638528860000000000000000000000.000000, which is the only vaild output.

So, initially we have these many valid digits 3402823466 So after the 1st division we have only 0402823466 So, the system need to get rid of the left 0 and introduce a new digit at the right. In ideal integer division, it is 0. Because you are doing floating division (value /= base;) , system is getting some other digit to fill in that location.

So, in my opinion, the printf could be assigning the above available significant digits to an integer and working with this.

answered Jun 06 '12 at 11:17

PermanentGuest

5,213
2
27
36

Again, FLT_MAX = 2^128-2^104 = 340282346638528859811704183484516925440 and not any other number. – fredoverflow Jun 06 '12 at 11:54
However digits you have in writing, it simply doesn't fit inside a 23 bit mantissa, unfortunately... so whatever present after first 7 digits can be used. Rest everything is discarded by the system. – PermanentGuest Jun 06 '12 at 11:59
Let's take a look at a simpler value like 2^100. It only requires 1 bit of precision, and the exact value is `1267650600228229401496703205376`. All these decimal digits fit perfectly well into a float. [Here is the proof](http://ideone.com/f21mH). If Visual C++ yields a different output, it is broken. – fredoverflow Jun 06 '12 at 12:04
Adding another comment since previous one can't be edited. "How many ever digits you have in writing/resulting from computation, they simply don't fit inside a 23 bit mantissa, unfortunately... so whatever only first 7 digits can be used. Rest everything is discarded by the system." – PermanentGuest Jun 06 '12 at 12:05
Visual C++ gives 1267650600228229400000000000000.000000. I don't know the explanation of ideaone. But, I would be very curios to know in 23 bits how this much information is stored. – PermanentGuest Jun 06 '12 at 12:14
All 23 bits are zero, because 2^100 only requires 1 bit, and that bit is always implicitly 1 in normalized floating point values. – fredoverflow Jun 06 '12 at 12:15
By the way, did you use `printf` in Visual C++ or `std::cout`? – fredoverflow Jun 06 '12 at 12:16
printf(), I just copy pasted your code. This is getting interesting. – PermanentGuest Jun 06 '12 at 12:19
What does `std::cout << std::fixed << f << std::endl;` print? – fredoverflow Jun 06 '12 at 12:26
Same : i.e, "1267650600228229400000000000000.000000" – PermanentGuest Jun 06 '12 at 12:39
What do you think? ;) I debugged it and it goes to same sprintf_s(), finally. Unfortunately I can't debug into the CRT implementation of floating->char* conversion – PermanentGuest Jun 06 '12 at 12:47
The shortest decimal string representation of a number that gets parsed to `2^100` as a `double` (when the parsing is correct, and `double` is IEEE754 64-bit format), is `1.2676506002282294e30` in scientific notation, and that is - except for the `%f` formatting - exactly what Visual C++ prints. Note that for `printf` at least, `float`s are promoted to `double` when passed. gcc, as used by ideone, prints the exact value (since it's an integer, it would round to six places after the decimal point if necessary). – Daniel Fischer Jun 06 '12 at 13:26

wilx · Accepted Answer · 2016-12-17T17:45:22.067

It appears that the work horse for the float to string conversion is the dtoa() function. See dtoa.c in newlib for how they do it.

Now, how does printf manage to print the correct result?

I think it is close to magic. At least the source looks like some kind of dark incantation.

Does it use some bigint facilities internally?

Yes, search for _Bigint in the linked source file.

Or is there some floating point trick I am missing?

Likely.

score 0 · Answer 5 · answered Jun 07 '12 at 18:48

Let's explain this one more time. After the integer part has been printed (exactly) without any rounding other than chop towards 0 it's time for the decimal bits.

Start with a string of bytes (say 100 for starters) containing binary zeros. If the first bit to the right of the decimal point in the fp value is set that means that 0.5 (2^-1 or 1/(2^1)is a component of the fraction. So add 5 to the first byte. If the next bit is set 0.25 (2^-2 or 1/(2^2)) is part of the fraction add 5 to the second byte and add 2 to the first (oh, don't forget the carry, they happen - lower school math). The next bit set means 0.125 so add 5 to the third byte, 2 to the second and 1 to the first. And so on:

      value          string of binary 0s
start 0              0000000000000000000 ...
bit 1 0.5            5000000000000000000 ...
bit 2 0.25           7500000000000000000 ...
bit 3 0.125          8750000000000000000 ...
bit 4 0.0625         9375000000000000000 ...
bit 5 0.03125        9687500000000000000 ...
bit 6 0.015625       9843750000000000000 ...
bit 7 0.0078125      9921875000000000000 ...
bit 8 0.00390625     9960937500000000000 ...
bit 9 0.001953125    9980468750000000000 ...
...

I did this by hand so I may have missed something but to implement this in code is trivial.

So for all those SO "can't get an exact result using float" people who don't know what they're talking about here is proof that floating point fraction values are perfectly exact. Excruciatingly exact. But binary.

For those who take the time to get their heads around how this works, better precision is well within reach. As for the others ... well I guess they'll keep on not browsing the fora for the answer to a question which has been answered numerous times previously, honestly believe they have discovered "broken floating point" (or whatever thay call it) and post a new variant of the same question every day.

"Close to magic," "dark incantation" - that's hilarious!

score 0 · Answer 6 · answered Aug 12 '18 at 05:25

0

This program will work for you.

#include<stdio.h>
int main()
{
    float num;
    int z;
    scanf("%f",&num);
    z=(int)num;
    printf("the integral part of the floating point number is %d",z);
}

answered Aug 12 '18 at 05:25

ismailmushraf

1
1

no this doesn't work for large floats. Try `FLT_MAX` and see that it's not `340282346638528859811704183484516925440` as expected by the OP – phuclv Apr 13 '21 at 16:40

printing the integral part of a floating point number

6 Answers6

Linked