I have a requirement to represent 256 bit whole numbers. Right now i'm using __uint128_t[2]
. Check out the below code.
#include <stdio.h>
int main()
{
__uint128_t a = 0xffffffffffffffff; a<<=64; a+=0xffffffffffffffff;
double b = a;
printf("Output - %lf\n", b);
return 0;
}
The Output is Output - 340282366920938463463374607431768211456.000000
.
This is just 1 greater than the correct value that is 340282366920938463463374607431768211455
. Now i will change the value of a.
#include <stdio.h>
int main()
{
__uint128_t a = 0xffffffffffffffff; a<<=64;
double b = a;
printf("Output - %lf\n", b);
return 0;
}
The Output is Output - 340282366920938463463374607431768211456.000000
which is way off from the correct value which is 340282366920938463444927863358058659840
.
1 - What is happening here? Why is the first answer just 1 off from the correct answer?
2 - There is a lot of confusion when i check online as to the max value of a whole number
i can represent in datatype double. What is the max value? Please give the max whole number(+ve number) either in no of bits or the value itself.
3 - Follow up to 2nd question - Lets say the answer to above is 48 bits. The reason i am using double is to divide two 48 bit numbers and get the first digit after decimal point. Can the special need to just have one digit after decimal point increase the max value?
Edit - I did read this link before posting. What is 1.8*10^308 here and 2^53 here? They are both mentioned as biggest possible integers which can be represented using double without loss of precision