Floating-point numbers are usually represented by IEEE floating-point number (lets assume this for arguments case, even if it is not specified by the standard).
See: IEEE
If you read the article, you will notice that anything below the decimal point is represented by a negative power of 2 (after shifting and all that).
0.5 => 1 bit one point below zero = 2^-1
0.25 => 1 bit two points below zero => 2^-2
0.125 => 1 bit three points below zero => 2^-3
etc.
So the sub integer part of your number needs to be the sum of these bits. Assuming you have no integer part it needs to fit in 52 consecutive bits (53 because they do some clever optimization shit).
So lets look at 8.296
.
The integer bit (8) is: 1000 So we have used 4 bits (3 because of optimization). So we have 49 bits to represent the 0.296.
Bits Below zero Value Amount Left Bit Set or Not
- - 0.296 -
1 0.5 0.296 0
2 0.25 0.046 1
3 0.125 0.046 0
4 0.0625 0.046 0
5 0.03125 0.01475 1
6 0.015625 0.01475 0
7 0.0078125 0.0069375 1
8 0.00390625 0.00303125 1
9 0.001953125 0.001078125 1
10 0.0009765625 0.0001015625 1
etc // thus numbers coule be wrong I did it by hand.
we have another 39 bits to go.
8.296 = 10000100101111<another 39 bits>
Then add an exponent to shift it so only
have one bit above zero.
If I do the same loop and also print out the difference from zero.
NewVal: 82.96 diff from zero: 0.95999999999999375
NewVal: 829.6 diff from zero: 0.59999999999993747
NewVal: 8296 diff from zero: 0.99999999999937472
NewVal: 82960 diff from zero: 0.99999999999374722
NewVal: 8.296e+05 diff from zero: 0.99999999993747224
NewVal: 8.296e+06 diff from zero: 0.99999999937472239
NewVal: 8.296e+07 diff from zero: 0.99999999374995241
NewVal: 8.296e+08 diff from zero: 0.99999993748497218
Notice how the new number gets closer to zero (but then starts moving away (this is because we are loosing precision and bits of the end).
Code:
#include <iostream>
#include <iomanip>
int multiplyTillInt(long double n)
{
while(static_cast<int>(n) != n) {
n *= 10;
std::cout << "NewVal: "
<< std::setw( 19 ) << std::setprecision( 5 ) << n
<< " diff from zero: "
<< std::setw( 19 ) << std::setprecision( 17 ) << (n - static_cast<int>(n)) << "\n";
}
return n;
}
int main()
{
multiplyTillInt(8.296);
}