Why double can store bigger numbers than unsigned long long?

Question

The question is, I don't quite get why double can store bigger numbers than unsigned long long. Since both of them are 8 bytes long, so 64 bits.

Where in unsigned long long, all 64 bits are used in order to store a value, on the other hand double has 1 for sign, 11 for exponent and 52 for mantissa. Even if 52 bits, which are used for mantissa, will be used in order to store decimal numbers without floating point, it still has 63 bits ...

BUT LLONG_MAX is significantly smaller than DBL_MAX ...

Why?

Interestingly I would assume that both formats can distinguish about the same number of numbers. Generally, a few bit patterns will be invalid for floats, giving them a little disadvantage. — Peter - Reinstate Monica, May 05 '15 at 12:20
@Peter, I think (from memory) it's a bit more than a little disadvantage. There's a large chunk missing because of the representations of infinities and NaNs. Since NaN is defined as exponent of all 1-bits and _any_ non-zero fraction, a large number of bit patterns map to it - these reduce the pool of bit patterns available for real numbers considerably. A double with its 52-bit fraction means that 2^52-1 patterns are lost to NaN - that's about four quadrillion missing numbers :-) Though that's only about 1% of the total space, I think. — paxdiablo, May 05 '15 at 12:34
@RogerRowland How do you gain in size and lose in precision. I need much more detail ! How is it implemented ? How is it stored ? — denis631, May 05 '15 at 12:38
@paxdiablo Well, 2^52 is only 1/4000 of 2^64! I call that little :-) — Peter - Reinstate Monica, May 05 '15 at 12:39
denis, you can follow the link in my answer, or google for "what every computer scientist should know about floating point". — paxdiablo, May 05 '15 at 12:46
@paxdiablo I want to understand HOW it is possible to store in 11 bits of exponent a number, which is bigger than the LLONG_MAX. — denis631, May 05 '15 at 13:03
Yes, and I'm pretty certain told you how to find out, the article I linked to shows how the 11-bit exponent scales the fractional part to massively increase the range (at the cost of precision). If you read the post and _still_ don't understand, you should ask another question, detailing the bits you're having trouble with. I've updated the answer with some information on scaling which may help you out but you probably need to bite the bullet and investigate how it really works. — paxdiablo, May 05 '15 at 13:46
Obligatory link: [What Every Computer Scientist Should Know About Floating Point Arithmetic](https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numerics/Double/paper.pdf) (PDF file). — John Bode, May 05 '15 at 14:55

Damon · Accepted Answer · 2015-05-05T14:06:40.693

The reason is that unsigned long long will store exact integers whereas double stores a mantissa (with limited 52-bit precision) and an exponent.

This allows double to store very large numbers (around 10³⁰⁸) but not exactly. You have about 15 (almost 16) valid decimal digits in a double, and the rest of the 308 possible decimals are zeroes (actually undefined, but you can assume "zero" for better understanding).
An unsigned long long only has 19 digits, but every single of them is exactly defined.

EDIT:
In reply to below comment "how does this exactly work", you have 1 bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa. The mantissa has an implied "1" bit at the beginning, which is not stored, so effectively you have 53 mantissa bits. 2⁵³ is 9.007E15, so you have 15, almost 16 decimal digits to work with.
The exponent has a sign bit, and can range from -1022 to +1023, which is used to scale (binary shift left or right) the mantissa (2¹⁰²³ is around 10³⁰⁷, hence the limits on range), so very small and very large numbers are equally possible with this format.
But, of course, all numbers that you can represent only have as much precision as will fit into the matissa.

All in all, floating point numbers are not very intuitive, since "easy" decimal numbers are not necessarily representable as floating point numbers at all. This is due to the fact that the mantissa is binary. For example, it is possible (and easy) to represent any positive integer up to a few billion, or numbers like 0.5 or 0.25 or 0.0125, with perfect precision.
On the other hand, it is also possible to represent a number like 10²⁵⁰, but only approximately. In fact, you will find that 10²⁵⁰ and 10²⁵⁰+1 are the same number (wait, what???). That is because although you can easily have 250 digits, you do not have that many significant digits (read "significant" as "known" or "defined").
Also, representing something seemingly simple like 0.3 is also only possible approximately, even though 0.3 isn't even a "big" number. However, you can't represent 0.3 in binary, and no matter what binary exponent you attach to it, you will not find any binary number that results in exactly 0.3 (but you can get very close).

Some "special values" are reserved for "infinity" (both positive and negative) as well as "not a number", so you have very slightly less than the total theoretical range.

unsigned long long on the other hand, does not interprete the bit pattern in any way. All numbers that you can represent are simply the exact number that is represented by the bit pattern. Every digit of every number is exactly defined, no scaling happens.

@denis631 just a quick demonstration of the statement about the 15-16 valid decimal digits of `double` vs. 19 exact digits of `unsigned unsigned long`: http://ideone.com/8igfpt — axiac, May 05 '15 at 12:47
I disagree with "unsigned long long ... does not interpret the bit pattern in any way." **Every** system that maps a sequence of zeros and ones to a real number involves interpretation. The binary positional system just happens to be particularly familiar to programmers. — Patricia Shanahan, May 05 '15 at 16:08
@PatriciaShanahan: When you said "real number" I assume you meant "actual number" rather than a [Real number](https://en.wikipedia.org/wiki/Real_number) because a long is an [integral type](https://msdn.microsoft.com/en-us/library/c6bf8dw1(v=vs.71).aspx). If so, you are mistaken in your assertion. Binary notation and decimal notation are exactly equivalent systems in terms of their representational capacity. Conversion from one to the other is lossless in either direction. — kmote, Apr 27 '17 at 22:39
@kmote I meant "real number" in the sense of any limit of a Cauchy sequence of rational numbers. There are many ways one could represent finite subsets of the real numbers as sequences of e.g. 64 zeroes and ones, with different interpretations. Interpreting the zeros and ones as binary digits in a positional system is just one of them. The only thing that is special about it is that it is very commonly used in programming. — Patricia Shanahan, Apr 27 '17 at 22:53

paxdiablo · Answer 2 · 2016-01-12T13:32:13.400

IEEE754 floating point values can store a larger range of numbers simply because they sacrifice precision.

By that, I mean that a 64-bit integral type can represent every single value in its range but a 64-bit double cannot.

For example, trying to store 0.1 into a double won't actually give you 0.1, it'll give you something like:

0.100000001490116119384765625

(that's actually the nearest single precision value but the same effect will apply for double precision).

But, if the question is "how do you get a larger range with fewer bits available to you?", it's simply that some of those bits are used to scale the value.

Classic example, let's say you have four decimal digits to store a value. With an integer, you can represent the numbers 0000 through 9999 inclusive. The precision within that range is perfect, you can represent every integral value.

However, let's go floating point and use the last digit as a scale so that the digits 1234 actually represent the number 123 x 10⁴.

So now your range is from 0 (represented by 0000 through 0009) through 999,000,000,000 (represented by 9999 being 999 x 10⁹).

But you cannot represent every number within that range. For example, 123,456 cannot be represented, the closet you can get is with the digits 1233 which give you 123,000. And, in fact, where the integer values had a precision of four digits, now you only have three.

That's basically how IEEE754 works, sacrificing precision for range.

score 6 · Answer 3 · edited Jun 20 '20 at 09:12

Disclaimer

This is an attempt to provide an easy to understand explanation about how the floating point encoding works. It is a simplification and it does not cover any of the technical aspects of the real IEEE 754 floating point standard (normalization, signed zero, infinities, NaNs, rounding etc). However, the idea presented here is correct.

Understanding how the floating point numbers work is severely impeded by the fact that computers work with numbers in base 2 while the humans don't easily handle them. I'll try to explain how the floating point numbers work using base 10.

Let's construct a floating point number representation using signs and base 10 digits (i.e. the usual digits from 0 to 9 we are using on a daily basis).

Let's say we have 10 square cells and each cell can hold either a sign (+ or -) or a decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8 or 9).

We can use the 10 digits to store signed integer numbers. One digit for the sign and 9 digits for the value:

sign -+   +-------- 9 decimal digits -----+
      v   v                               v
    +---+---+---+---+---+---+---+---+---+---+
    | + | 0 | 0 | 0 | 0 | 0 | 1 | 5 | 0 | 0 |
    +---+---+---+---+---+---+---+---+---+---+

This is how value 1500 is represented as an integer.

We can also use them to store floating point numbers. For example, 7 digits for mantissa and 3 digits for exponent:

  +------ sign digits --------+
  v                           v
+---+---+---+---+---+---+---+---+---+---+
| + | 0 | 0 | 0 | 1 | 5 | 0 | + | 0 | 1 |
+---+---+---+---+---+---+---+---+---+---+
|<-------- Mantissa ------->|<-- Exp -->|

This is one of the possible representations of 1500 as floating point value (using our 10 decimal digits representation).

The value of mantissa (M) is +150, the value of exponent (E) is +1. The value represented above is:

V = M * 10^E = 150 * 10^1 = 1500

The ranges

The integer representation can store signed values between -(10^9-1) (-999,999,999) and +(10^9-1) (+999,999,999). More, it can represent each and every integer value between these limits. Even more, there is a single representation for each value and it is exact.

The floating point representation can store signed values for mantissa (M) between -999,999 and +999,999 and for exponent (E) between -99 and +99.

It can store values between -999,999*10^99 and +999,999*10^99. These numbers have 105 digits, much more than the 9 digits of the biggest numbers represented as integers above.

The loose of precision

Let's remark that for integer values, M stores the sign and the first 6 digits of the value (or less) and E is the number of digits that did not fit into M.

V = M * 10^E

Let's try to represent V = +987,654,321 using our floating point encoding.

Because M is limited to +999,999 it can only store +987,654 and E will be +3 (the last 3 digits of V cannot fit in M).

Putting them together:

+987,654 * 10^(+3) = +987,654,000

This is not our original value of V but the best approximation we can get using this representation.

Let's remark that all the numbers between (and including) +987,654,000 and +987,654,999 are approximated using the same value (M=+987,654, E=+3). Also there is no way to store decimal digits for numbers greater than +999,999.

As a general rule, for numbers bigger than the maximum value of M (+999.999), this method produces the same representation for all values between +999,999*10^E and +999,999*10^(E+1)-1 (integer or real values, it doesn't matter).

Conclusion

For large values (larger than the maximum value of M), the floating point representation has gaps between the numbers it can represent. These gaps become bigger and bigger as the value of E increases.

The entire idea of the "floating point" is to store a dozen or so of the most representative digits (the beginning of the number) and the magnitude of the number.

Let's take the speed of light as an example. Its value is about 300,000 km/s. Being so massive, for most practical purposes you don't care if it's 300,000.001 km/s or 300,000.326 km/s.

In fact, it is not even that big, a better approximation is 299,792.458 km/s.

The floating point numbers extract the important characteristics of the speed of light: its magnitude is of hundreds of thousands of km/s (E=5) and its value is 3 (hundred of thousands km/s).

speed of light = 3*10^5 km/s

Our floating point representation can approximate it by: 299,792 km/s (M=299,792, E=0).

John Bode · Answer 4 · 2015-05-05T16:21:30.477

What kind of magic is happening ???

The same kind of magic that allows you to represent the 101-digit number

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

as

1.0 * 10¹⁰⁰

It's just instead of base 10 you're doing it in base 2:

0.57149369564113749110789177415267 * 2³³³.

This notation allows you to represent very large (or very small) values in a compact manner. Instead of storing every digit, you store the significand (a.k.a. the mantissa or fraction) and the exponent. This way, numbers that are hundreds of decimal digits long can be represented in a format that takes up only 64 bits.

It is the exponent that allows floating-point numbers to represent such a large range of values. The exponent value 1024 only requires 10 bits to store, but 2¹⁰²⁴ is a 308-digit number.

The tradeoff is that not every value can be represented exactly. With a 64-bit integer, every value between 0 and 2⁶⁴-1 (or -2⁶³ to 2⁶³-1) has an exact representation. That is not true of floating-point numbers for several reasons. First of all, you only have so many bits, giving you only so many digits of precision. For example, if you only have 3 significant digits, then you cannot represent values between 0.123 and 0.124, or 1.23 and 1.24, or 123 and 124, or 1230000 and 1240000. As you approach the edge of your range, the gap between representable values gets larger.

Secondly, just like there are values that cannot be represented in a finite number of digits (3/10 gives the non-terminating sequence 0.33333...₁₀), there are values that cannot be represented in a finite number of bits (1/10 gives the non-terminating sequence 1.100110011001...₂).

score 2 · Answer 5 · answered May 05 '15 at 13:37

Perhaps you feel that "storing a number in N bits" is something fundamental, whereas there are various ways of doing it. In fact, it is more accurate to say we represent a number in N bits, as the meaning depends on what convention we adopt. We can, in principle, adopt any convention we like for which numbers different N-bit patterns represent. There is the binary convention, as used for unsigned long long and other integer types, and the mantissa+exponent convention as used for double, but we could also define an (absurd) convention of our own, in which, for example, all bits zero means any enormous number you care to specify. In practice we usually use conventions which allow us to combine (add, multiply, etc.) numbers efficiently using the hardware on which we run our programmes.

That said, your question has to be answered by comparing the largest binary N-bit number with the largest number of the form 2^exponent * mantissa, where exponent mantissa are E- and M-bit binary numbers (with an implicit 1 at the start of the mantissa). That is 2^(2^E-1) * (2^M - 1), which is typically indeed far greater than 2^N - 1.

score 0 · Answer 6 · answered May 05 '15 at 13:22

A small example of Damon and Paxdiablo explanations:

#include <stdio.h>

int main(void) {
    double d = 2LL<<52;
    long long ll = 2LL<<52;
    printf("d:%.0f  ll:%lld\n", d, ll);
    d++; ll++;
    printf("d:%.0f  ll:%lld\n", d, ll);
}

Output:

d:72057594037927936  ll:72057594037927936
d:72057594037927936  ll:72057594037927937

Both variables would have been incremented the same way with a shift of 51 or less.

Why double can store bigger numbers than unsigned long long?

6 Answers6

Disclaimer

The ranges

The loose of precision

Conclusion

Linked

Related