295

What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision?

In other words, at would the follow code fragment return:

UInt64 i = 0;
Double d = 0;

while (i == d)
{
        i += 1; 
        d += 1;
}
Console.WriteLine("Largest Integer: {0}", i-1);
phuclv
  • 37,963
  • 15
  • 156
  • 475
Franck Freiburger
  • 26,310
  • 20
  • 70
  • 95
  • 1
    "no-floating" → [fixed point](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) – phuclv Aug 15 '22 at 15:36
  • Re "*What is the biggest "no-floating" integer*", All numbers are represented using a floating point except +0, -0, the really really tiny subnormals (which use a fixed point), the infinities and the NaNs. This is obviously not what you meant to ask. You seem to be asking what's the largest integer where it and every integer smaller than it can be exactly represented by a double. – ikegami Aug 06 '23 at 17:56

11 Answers11

673

The biggest/largest integer that can be stored in a double without losing precision is the same as the largest possible value of a double. That is, DBL_MAX or approximately 1.8 × 10308 (if your double is an IEEE 754 64-bit double). It's an integer. It's represented exactly. What more do you want?

Go on, ask me what the largest integer is, such that it and all smaller integers can be stored in IEEE 64-bit doubles without losing precision. An IEEE 64-bit double has 52 bits of mantissa, so it's 253 (and -253 on the negative side):

  • 253 + 1 cannot be stored, because the 1 at the start and the 1 at the end have too many zeros in between.
  • Anything less than 253 can be stored, with 52 bits explicitly stored in the mantissa, and then the exponent in effect giving you another one.
  • 253 obviously can be stored, since it's a small power of 2.

Or another way of looking at it: once the bias has been taken off the exponent, and ignoring the sign bit as irrelevant to the question, the value stored by a double is a power of 2, plus a 52-bit integer multiplied by 2exponent − 52. So with exponent 52 you can store all values from 252 through to 253 − 1. Then with exponent 53, the next number you can store after 253 is 253 + 1 × 253 − 52. So loss of precision first occurs with 253 + 1.

ikegami
  • 367,544
  • 15
  • 269
  • 518
Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 176
    +1 Good job noticing that the question did not really mean what the asker probably intended and providing both answers ("technically correct" and "probably expected"). – Pascal Cuoq Dec 04 '09 at 18:32
  • 82
    Or "messing about" and "trying to help" as I tend to call them :-) – Steve Jessop Dec 04 '09 at 18:34
  • I have juste added more precision in the description of my question. I was talking about the biggest "no-floating" integer. – Franck Freiburger Dec 04 '09 at 18:37
  • I bow to your superior analysis of the question. +1, nice one! – Carl Smotricz Dec 04 '09 at 18:39
  • @Soubok, you have to define what "no-floating" is first, because it sure isn't a term with a standard meaning! – Pavel Minaev Dec 04 '09 at 18:46
  • @Carl: *Everybody* bows to Steve "Who the hell is Jon Skeet?" Jessop :) – Dan Moulding Dec 04 '09 at 18:47
  • 9
    I bow to Tony the Pony, and no other. – Steve Jessop Dec 04 '09 at 18:51
  • @Pavel Minaev: Yes I know, but if I ask such a question, unfortunately this is because I don't know all standards :) – Franck Freiburger Dec 04 '09 at 19:46
  • 13
    You don't mean "all smaller integers", you mean all integers of equal or lesser magnitude. Because there are a lot of negative integers below below 2^53 and cannot be represented exactly in a double. – Southern Hospitality Dec 05 '09 at 08:54
  • 16
    I do mean smaller, and that's exactly what I mean when I say smaller :-) -1,000,000 is less than 1, but it is not smaller. – Steve Jessop Dec 05 '09 at 13:23
  • @SteveJessop, can you explain the first sentence? why is the biggest integer that can be stored in a double without losing precision is the same as the largest possible value of a double? – Pacerier Sep 21 '13 at 19:14
  • 2
    @Pacerier: It's an integer, and its representation as `double` is exact, and it's the largest integer with that property. Hence it answers the title of this question, "biggest integer that can be stored in a double". I don't think I can explain any further, there's only so much mileage in explaining a joke. – Steve Jessop Sep 23 '13 at 18:35
  • 1
    @SteveJessop "_Anything less than 2^53 can be stored, with 52 bits explicitly stored in the mantissa, and then the exponent in effect giving you another one_" I couldn't understand this correctly; are you talking about the implicit/hidden bit because I cannot imagine how the exponent gives the 53rd bit. Please clarify. – legends2k Jun 19 '14 at 14:34
  • @legends2k: The exponent tells you the position of the implicit/hidden bit. That is, the exponent in effect gives you the additional bit of precision. – Steve Jessop Jun 19 '14 at 15:05
  • @SteveJessop But isn't the implicit bit always at the beginning? For instance, _1.XXX_ is the significand for [minifloats](http://en.wikipedia.org/wiki/Minifloat) which have 1 sign, 4 exponent and 3 significand bits. – legends2k Jun 19 '14 at 15:13
  • @legends2k: the implicit bit doesn't actually exist in the object representation of the float, that's why it's called "implicit". So yes it's "at the beginning" of the value. What in the object representation tells you where the beginning of the value is? The exponent. – Steve Jessop Jun 19 '14 at 15:33
  • 3
    Extra bonus for being a smartass. – Mad Physicist Jul 31 '15 at 14:10
  • 1
    You can encode everything smaller than 2^53 because the *exponent value* can go past 2^53, floating the point all the way to the right of the mantissa. Saying that the exponent gives an "extra bit" is confusing. If we had just 5 bits of exponent for example you wouldn't be able to encode integers between 2^33 and 2^53 even with the 52 bits of mantissa + 1 implicit. – Joan Charmant Aug 20 '16 at 14:16
  • What's the related number on the negative side? `-(2^53)`? – Levi Morrison Nov 30 '17 at 20:25
  • @LeviMorrison: yes, by the same arguments but with the sign bit set. – Steve Jessop Jan 10 '18 at 00:24
  • What is the exact value of 2^53? Google calculator just says `9.0071993e+15` but I need the exact value. – Aaron Franke Jan 16 '20 at 01:41
  • @Aaron: um, the exact value is 9007199254740992. As a 64-bit float, this is the number with sign 0, significand bits all zero, and exponent +53. The bit pattern is `0x4340000000000000`. A converter like http://weitz.de/ieee/ might help you see why (others are available, that's just the first one I found). – Steve Jessop May 25 '20 at 22:05
  • @SteveJessop: I want to convert a `long long` (signed 64-bit integer) to a `double`. I want to show an error message if the number in the `long long` is too large or too small to be represented by a `double`. What boundaries should I check for? In your answer you mention 2^53 as the upper limit but what about negative numbers? What is the smallest integer I can store in a `double`? – Andreas Jul 15 '20 at 19:05
  • 1
    @Andreas: IEEE floats have a separate sign bit, so they're completely symmetrical about 0. `-2^53` is just `2^53` with the sign bit flipped. `-2^53 - 1` is not representable for the same reason that `2^53 + 1` is not representable. – Steve Jessop Jul 15 '20 at 19:30
  • Having said that, following the smartass theme of my answer to the original question: the smallest integer that `double` can represent is 0. -1 squllion is less than 0, but it isn't smaller, it's a "large negative number" ;-) – Steve Jessop Jul 15 '20 at 19:33
  • In any case you could convince yourself by actually doing the computation: `(double)-9007199254740992LL == ((double)-9007199254740992LL) - 1` is true. `(double)-9007199254740992LL == ((double)-9007199254740992LL) + 1` is false. So, `-2^53` is the point where `double` loses precision. – Steve Jessop Jul 15 '20 at 19:38
  • @SteveJessop: But when I do this: `long long x = 9007199254740992; double y = (double) x; printf("%.14g\n", y);` it prints `9.007199254741e+015` which is `9007199254741000` so 8 more than my initial `long long`. Why is that? – Andreas Jul 15 '20 at 20:26
  • @Andreas: you asked for 14 digits in the format code `%.14g`. You only got 13, but that's because to 14 places it's 9.007199254741**0**e+015, and the trailing 0 isn't printed. Try `%.16g`. – Steve Jessop Jul 15 '20 at 20:30
  • 2
    It's interesting that nobody has mentioned this question's connection with Javascript. Javascript doesn't have an integer type, everything is a float (double) so this answer gives you the range of integers in Javascript. – Mark Ransom Mar 28 '21 at 03:06
103

9007199254740992 (that's 9,007,199,254,740,992 or 2^53) with no guarantees :)

Program

#include <math.h>
#include <stdio.h>

int main(void) {
  double dbl = 0; /* I started with 9007199254000000, a little less than 2^53 */
  while (dbl + 1 != dbl) dbl++;
  printf("%.0f\n", dbl - 1);
  printf("%.0f\n", dbl);
  printf("%.0f\n", dbl + 1);
  return 0;
}

Result

9007199254740991
9007199254740992
9007199254740992
phuclv
  • 37,963
  • 15
  • 156
  • 475
pmg
  • 106,608
  • 13
  • 126
  • 198
  • 8
    Assuming it will be 'close' but less than a 2^N, then a faster test is `double dbl = 1; while (dbl + 1 != dbl) dbl *= 2; while (dbl == --dbl);` which yields the same result – Seph Mar 06 '12 at 10:21
  • 4
    @Seph what the...? No? `while (dbl == --dbl)` will loop forever or not at all. :) (in this case, not at all, since it is a 2^N). You'll have to approach it from below. It will indeed also result in one less than the expected result (since the one check in the while loop decrements dbl). And it depends on order of execution, if the decrement is done before or after evaluating the left side (which is undefined as far as I know). If it's the former, it'll always be true and loop forever. – falstro Oct 25 '16 at 14:53
  • 17
    Maybe indicate that 2^53=9,007,199,254,740,992 somewhere. – Xonatron Oct 24 '17 at 15:40
  • 2
    It's hard to argue with this! Nice experiment – MattM Jun 13 '18 at 20:06
  • A weakness to using `while (dbl + 1 != dbl) dbl++;` in that `dbl + 1 != dbl` may evaluate using `long double` math - consider `FLT_EVAL_METHOD == 2`. This could end in an infinite loop. – chux - Reinstate Monica Sep 25 '18 at 19:27
  • FWIW, the value you quote minus one (in terms of the IEEE 754 double precision representation) has an exponent of 1075 - 1023 = 52 and a mantissa with all (52) one bits after the decimal point. The next value (900...2) then has all (52) zeros after the decimal point in the mantissa and an exponent of 1076 - 1023 = 53. – Andre Holzner Mar 12 '19 at 16:50
  • When compiling this example on 32bits the condition `dbl + 1 != dbl` is indefinitely true. This is because in 64bits the `gcc` uses `sse` by default, this is not the case in 32bits. The example behave similarly in 32bits when passing `-mfpmath=sse` to `gcc`. – Daniel Da Cunha Sep 10 '19 at 10:15
  • In hexadecimal, this value is `0x20000000000000` – Aaron Franke Jan 16 '20 at 03:54
  • fwiw `float` ends at 16777216 – hanshenrik Aug 10 '21 at 23:35
30

The largest integer that can be represented in IEEE 754 double (64-bit) is the same as the largest value that the type can represent, since that value is itself an integer.

This is represented as 0x7FEFFFFFFFFFFFFF, which is made up of:

  • The sign bit 0 (positive) rather than 1 (negative)
  • The maximum exponent 0x7FE (2046 which represents 1023 after the bias is subtracted) rather than 0x7FF (2047 which indicates a NaN or infinity).
  • The maximum mantissa 0xFFFFFFFFFFFFF which is 52 bits all 1.

In binary, the value is the implicit 1 followed by another 52 ones from the mantissa, then 971 zeros (1023 - 52 = 971) from the exponent.

The exact decimal value is:

179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368

This is approximately 1.8 x 10308.

Simon Biber
  • 497
  • 4
  • 5
  • 2
    What about the largest value that it can represent with all values between it and zero contiguously representable? – Aaron Franke Jan 16 '20 at 01:41
  • @AaronFranke The question didn't ask about contiguous representation, but the answer to that different question has been included in most other answers here, or even wrongly given as the actual answer. It's 2⁵³ (2 to the power of 53). – Simon Biber Apr 29 '20 at 12:57
  • @AaronFranke : no amount of mantissa in the entire universe is sufficient to "represent all values" between zero and what that `x` is, unless you have figured out how to finitely express transcendental numbers – RARE Kpop Manifesto Aug 13 '22 at 16:26
  • @RAREKpopManifesto The question is specifically about integers, so in this context "values" refers to integers. – Aaron Franke Aug 14 '22 at 16:11
29

Wikipedia has this to say in the same context with a link to IEEE 754:

On a typical computer system, a 'double precision' (64-bit) binary floating-point number has a coefficient of 53 bits (one of which is implied), an exponent of 11 bits, and one sign bit.

2^53 is just over 9 * 10^15.

Carl Smotricz
  • 66,391
  • 18
  • 125
  • 167
  • @Steve Jessop more or less, that is indeed what I am saying. I have also encountered hardware systems that don't have a FPU that still need to be IEEE-compliant, so that "typical system" stuff doesn't really help me if I come back to here 8 months later and need the same info for my 68K-based microcontroller (assuming it doesn't have a FPU... I can't remember). – San Jacinto Dec 04 '09 at 18:39
  • 16
    @San Jacinto - "This is useless" is unduly harsh. The answer is quite useful, just not as useful as it would have been if it included the comment that typical computer systems do indeed use the IEEE 754 reprensentation. – Stephen C. Steel Dec 04 '09 at 18:47
  • @Stephen C. Steel, actually you are correct. Under my scenario, coming back to this at a later time and looking for the IEEE max, it is impossibly ambiguous as to what a 'typical system' is, but there is still merit in the answer besides this complaint. – San Jacinto Dec 04 '09 at 18:50
8

You need to look at the size of the mantissa. An IEEE 754 64 bit floating point number (which has 52 bits, plus 1 implied) can exactly represent integers with an absolute value of less than or equal to 2^53.

Dolphin
  • 4,655
  • 1
  • 30
  • 25
4

It is true that, for 64-bit IEEE754 double, all integers up to 9007199254740992 == 2^53 can be exactly represented.

However, it is also worth mentioning that all representable numbers beyond 4503599627370496 == 2^52 are integers. Beyond 2^52 it becomes meaningless to test whether or not they are integers, because they are all implicitly rounded to a nearby representable value.

In the range 2^51 to 2^52, the only non-integer values are the midpoints ending with ".5", meaning any integer test after a calculation must be expected to yield at least 50% false answers.

Below 2^51 we also have ".25" and ".75", so comparing a number with its rounded counterpart in order to determine if it may be integer or not starts making some sense.

TLDR: If you want to test whether a calculated result may be integer, avoid numbers larger than 2251799813685248 == 2^51

Jan Heldal
  • 148
  • 6
2

1.7976931348623157 × 10^308

http://en.wikipedia.org/wiki/Double_precision_floating-point_format

Jay
  • 472
  • 4
  • 14
  • 2
    this answer would be much better with a citation. – San Jacinto Dec 04 '09 at 18:14
  • 2
    @Carl well, if the integer has zeros beyond to the left, then it is precisely stored. – Wilhelm Dec 04 '09 at 18:27
  • 4
    @all you downvoters: 1.7976931348623157 × 10^308 **is** an exact integer. Do you all need to attend remedial math classes or something?? – Dan Moulding Dec 04 '09 at 18:43
  • 7
    We're down to semantics here in the discussion of this hopelessly sunk answer. True, that number can be represented exactly and thereby fulfills the letter of the question. But we all know it's a tiny island of exactitude in an ocean of near misses, and most of us correctly interpolated the question to mean "the largest number beyond which precision goes down the drain." Ah, isn't it wonderful that CompSci is an exact science? :) – Carl Smotricz Dec 04 '09 at 18:59
  • This is the correct answer to what was asked. The value DBL_MAX, which is the largest value which the IEEE double can represent IS an exact integer: in binary representation it is 53 ones followed by 971 zeros (and of course its exact expression in decimal notation is 308 digits long). However, the next smaller exact integer in the IEEE representation is 52 ones followed by 972 zeroes (i.e. a gap of 2^971). What the OP probably wanted was the upper limit of integer values that can be represented without gaps, which is 2^53 (as noted in other answers). – Stephen C. Steel Dec 04 '09 at 19:05
  • @Carl: But, given the question that was asked, is this answer *wrong*? No, it certainly isn't. And nobody really *knows* what the OP *meant* except, the OP. So why downvote an otherwise correct answer? Because you feel Jamie's talents of mind-reading aren't up to par? – Dan Moulding Dec 04 '09 at 19:16
  • Is the number given above the exact value of DBL_MAX? I'm not certain it isn't, but the Wikipedia article linked to indicates that it is approximate, and it certainly would be some coincidence for a number determined by the limits of a base-2 representation, to be divisible by quite that many powers of 10. Not that I downvoted this, just in my answer I indicated that my approximation was indeed an approximation. – Steve Jessop Dec 04 '09 at 19:58
  • @StephenC.Steel I know I'm 5yrs late to the party, but Double.MAX_VALUE is 1.797... x 10^308 = (((1 << 54)-1) * (1 << 970)), which is 53 ones followed by only 969 zeroes. The highest the exponent can get while still representing a number is 1022, minus the 52 explicit and the one implicit 1 leaves us with 969. – masterxilo Jun 02 '14 at 18:03
  • 3
    @DanMoulding 1.7976931348623157 × 10^308 is an exact integer, but I am pretty sure this particular integer cannot be stored exactly in a double. – Pascal Cuoq Sep 26 '14 at 22:15
  • 2
    Note (just in case someone links to this specific answer): the actual number is provided in Simon Biber's answer. – Alexey Romanov Jul 06 '18 at 14:08
  • This answer might be true, but pretty misleading, as chances are high that whoever is interested in the answer, has the question in mind: "what is the _range_ of integers between 0 and Max I can safely store, without losing information". And 10^308 distinct values certainly couldn't be represented on 64 bits. – vmatyi Dec 17 '21 at 15:38
  • @DanMoulding Most of us can tell what the OP meant, partly because of some of the wording, but mainly because the most simple (a.k.a., "literal") interpretation of the question would have much more limited use. If you were asking for the number given in this answer, you would probably be aware how strange the question is, and take care to emphasize that it is actually what you want. Furthermore, the answer should, at the very least, point out that it is answering the literal and likely unintended question, if not actually providing both answers to help all the people who end up here. – cesoid Mar 03 '22 at 20:59
  • @vmatyi : when you phrase it like that, you can ***technically*** represent all int from `0` to `2^54`, by leveraging the sign-bit as a way of signifying to ur code that this num is an odd when beyond `2^53`. e.g. `{ _= 25 + int( 4^3^3 / 2 ) , _*= (-1)^(_ == ( _+1 )), (+_<-_ )*-( (_ % 1e6) % 10 - 1 ) }'` :::: prints `9007199254741017 9007199254741017 0` 4 platforms w/ big-int support, otherwise print `9007199254741016 -9007199254741016 7` ….. that said, this is only 4 single instance representation -any additional math upon it would lose prec again – RARE Kpop Manifesto Aug 13 '22 at 16:53
0

As others has noted, I will assume that the OP asked for the largest floating-point value such that all whole numbers less than itself is precisely representable.

You can use FLT_MANT_DIG and DBL_MANT_DIG defined in float.h to not rely on the explicit values (e.g., 53):

#include <stdio.h>
#include <float.h>

int main(void)
{
    printf("%d, %.1f\n", FLT_MANT_DIG, (float)(1L << FLT_MANT_DIG));
    printf("%d, %.1lf\n", DBL_MANT_DIG, (double)(1L << DBL_MANT_DIG));
}

outputs:

24, 16777216.0
53, 9007199254740992.0
Jay Lee
  • 1,684
  • 1
  • 15
  • 27
0

Doubles, the "Simple" Explanation

The largest "double" number (double precision floating point number) is typically a 64-bit or 8-byte number expressed as:

1.79E308
or
1.79 x 10 (to the power of) 308

As you can guess, 10 to the power of 308 is a GIGANTIC NUMBER, like 170000000000000000000000000000000000000000000 and even larger!

On the other end of the scale, double precision floating point 64-bit numbers support tiny tiny decimal numbers of fractions using the "dot" notation, the smallest being:

4.94E-324
or
4.94 x 10 (to the power of) -324

Anything multiplied times 10 to the power of a negative power is a tiny tiny decimal, like 0.0000000000000000000000000000000000494 and even smaller.

But what confuses people is they will hear computer nerds and math people say, "but that number has a range of only 15 numbers values". It turns out that the values described above are the all-time MAXIMUM and MINIMUM values the computer can store and present from memory. But they lose accuracy and the ability to create numbers LONG BEFORE they get that big. So most programmers AVOID the maximum double number possible, and try and stick within a known, much smaller range.

But why? And what is the best maximum double number to use? I could not find the answer reading dozens of bad explanations on math sites online. So this SIMPLE explanation may help you below. It helped me!!

DOUBLE NUMBER FACTS and FLAWS

JavaScript (which also uses the 64-bit double precision storage system for numbers in computers) uses double precision floating point numbers for storing all known numerical values. It thus uses the same MAX and MIN ranges shown above. But most languages use a typed numerical system with ranges to avoid accuracy problems. The double and float number storage systems, however, seem to all share the same flaw of losing numerical precision as they get larger and smaller. I will explain why as it affects the idea of "maximum" values...

To address this, JavaScript has what is called a Number.MAX_SAFE_INTEGER value, which is 9007199254740991. This is the most accurate number it can represent for Integers, but is NOT the largest number that can be stored. It is accurate because it guarantees any number equal to or less than that value can be viewed, calculated, stored, etc. Beyond that range, there are "missing" numbers. The reason is because double precision numbers AFTER 9007199254740991 use an additional number to multiple them to larger and larger values, including the true max number of 1.79E308. That new number is called an exponent.

THE EVIL EXPONENT

It happens to be the fact that this max value of 9007199254740991 is also the max number you can store in the 53 bits of computer memory used in the 64-bit storage system. This 9007199254740991 number stored in the 53-bits in memory is the largest value possible that can be stored directly in the mantissa section of memory of a typical double precision floating point number used by JavaScript.

9007199254740991, by-the-way, is in a format we call Base10 or decimal, the number Humans use. But it is also stored in computer memory as 53-bits as this value...

11111111111111111111111111111111111111111111111111111

This the maximum number of bits computers can actually store the integer part of double precision numbers using the 64-bit number memory system.

To get to the even LARGER max number possible (1.79E308), JavaScript has to use an extra trick called the exponent to multiple it to larger and larger values. So there is an 11-bit exponent number next to the 53-bit mantissa value in computer memory above that allows the number to grow much larger and much smaller, creating the final range of numbers double are expected to represent. (Also, there is a single bit for positive and negative numbers, as well.)

After the computer reaches this limit of max Integer value (around ~9 quadrillion) and filling up the mantissa section of memory with 53 bits, JavaScript uses a new 11-bit storage area for the exponent which allows much larger integers to grow (up to 10 to the power of 308!) and much smaller decimals to get smaller (10 to the power of -324!). Thus, this exponent number allows for a full range of large and small decimals to be created with the floating radix or decimal point to move up and down the number, creating the complex fractional or decimal values you expect to see. Again, this exponent is another large number store in 11-bits, and itself has a max value of 2048.

You will notice 9007199254740991 is a max integer, but does not explain the larger MAX value possible in storage or the MINIMUM decimal number, or even how decimal fractions get created and stored. How does this computer bit value create all that?

The answer is again, through the exponent!

It turns out that the exponent 11-bit value is divided itself into a positive and negative value so that it can create large integers but also small decimal numbers.

To do so, it has its own positive and negative range created by subtracting 1024 from its 2048 max value to get a new range of values from +1023 to -1023 (minus reserved values for 0) to create the positive/negative exponent range. To then get the FINAL DOUBLE NUMBER, the mantissa (9007199254740991) is multiplied by the exponent (plus the single bit sign added) to get the final value! This allows the exponent to multiply the mantissa value to even larger integer ranges beyond 9 quadrillion, but also go the opposite way with the decimal to very tiny fractions.

However, the -+1023 number stored in the exponent is not multiplied to the mantissa to get the double, but used to raise a number 2 to a power of the exponent. The exponent is a decimal number, but not applied to a decimal exponent like 10 to the power or 1023. It is applied to a Base2 system again and creates a value of 2 to the power of (the exponent number).

That value generated is then multiplied to the mantissa to get the MAX and MIN number allowed to be stored in JavaScript, as well as all the larger and smaller values within the range. It uses "2" rather than 10 for precision purposes, so with each increase in the exponent value, it only doubles the mantissa value. This reduces the loss of numbers. But this exponent multiplier also means it will lose an increasing range of numbers in doubles as it grows, to the point where as you reach the MAX stored exponent and mantissa possible, very large swaths of numbers disappear from the final calculated number, and so certain numbers are now not possible in math calculations!

That is why most use the SAFE max integer ranges (9007199254740991 or less), as most know very large and small numbers in JavaScript are highly inaccurate! Also note that 2 to the power of -1023 gets the MIN number or small decimal fractions you associate with a typical "float". The exponent is thus used to translate the mantissa integer to very large and small numbers up to the Maximum and Minimum ranges it can store.

Notice that the 2 to power of 1023 translates to a decimal exponent using 10 to the power of 308 for max values. That allows you to see the number in Human values, or Base10 numerical format of the binary calculation. Often math experts do not explain that all these values are the same number just in different bases or formats.

THE TRUE MAX FOR DOUBLES IS INFINITY

Finally, what happens when integers reach the MAX number possible, or the smallest decimal fraction possible?

It turns out, double precision floating point numbers have reserved a set of bit values for the 64-bit exponent and mantissa values to store four other possible numbers:

  1. +Infinity
  2. -Infinity
  3. +0
  4. -0

For example, +0 in double numbers stored in 64-bit memory is a large row of empty bits in computer memory. Below is what happens after you go beyond the smallest decimal possible (4.94E-324) in using a Double precision floating point number. It becomes +0 after it runs out of memory! The computer will return +0, but stores 0 bits in memory. Below is the FULL 64-bit storage design in bits for a double in computer memory. The first bit controls +(0) or -(1) for positive or negative numbers, the 11-bit exponent is next (all zeros is 0, so becomes 2 to the power of 0 = 1), and the large block of 53 bits for the mantissa or significand, which represents 0. So +0 is represented by all zeroes!

0 00000000000 0000000000000000000000000000000000000000000000000000

If the double reaches its positive max or min, or its negative max or min, many languages will always return one of those values in some form. However, some return NaN, or overflow, exceptions, etc. How that is handled is a different discussion. But often these four values are your TRUE min and max values for double. By returning irrational values, you at least have have a representation of the max and min in doubles that explain the last forms of the double type that cannot be stored or explained rationally.

SUMMARY

So the MAXIMUM and MINIMUM ranges for positive and negative Doubles are as follows:

MAXIMUM TO MINIMUM POSITIVE VALUE RANGE
1.79E308 to 4.94E-324 (+Infinity to +0 for out of range)

MAXIMUM TO MINIMUM NEGATIVE VALUE RANGE
-4.94E-324 to -1.79E308 (-0 to -Infinity for out of range)

But the SAFE and ACCURATE MAX and MIN range is really:
9007199254740991 (max) to -9007199254740991 (min)

So you can see with +-Infinity and +-0 added, Doubles have extra max and min ranges to help you when you exceed the max and mins.

As mentioned above, when you go from the largest positive value to smallest decimal positive value or fraction, the bits zero out and you get 0 Past 4.94E-324 the double cannot store any decimal fraction value smaller so it collapses to +0 in the bit registry. The same event happens for tiny negative decimals which collapse past their value to -0. As you know -0 = +0 so though not the same values stored in memory, in applications they often are coerced to 0. But be aware many applications do deliver signed zeros!

The opposite happens to the large values...past 1.79E308 they turn into +Infinity and -Infinity for the negative version. This is what creates all the weird number ranges in languages like JavaScript. Double precision numbers have weird returns!

Note that he MINIMUM SAFE RANGE for decimals/fractions is not shown above as it varies based on the precision needed in the fraction. When you combine the integer with the fractional part, the decimal place accuracy drops away quickly as it goes smaller. There are many discussions and debates about this online. No one ever has an answer. The list below might help. You might need to change these ranges listed to much smaller values if you want guaranteed precision. As you can see, if you want to support up to 9-decimal place accuracy in floats, you will need to limit MAX values in the mantissa to these values. Precision means how many decimal places you need with accuracy. Unsafe means past these values, the number will lose precision and have missing numbers:

            Precision   Unsafe 
            1           5,629,499,534,21,312
            2           703,687,441,770,664
            3           87,960,930,220,208
            4           5,497,558,130,888
            5           68,719,476,736
            6           8,589,934,592
            7           536,870,912
            8           67,108,864
            9           8,388,608

It took me awhile to understand the TRUE limits of Double precision floating point numbers and computers. I created this simple explanation above after reading so much MASS CONFUSION from math experts online who are great at creating numbers but terrible at explaining anything! I hope I helped you on your coding journey - Peace :)

Stokely
  • 12,444
  • 2
  • 35
  • 23
  • I'm sorry you've encountered so many poor explanations of floating-point, but I'm afraid your understanding is still imperfect, because this answer still reflects a number of pretty serious misunderstandings. I don't have time today to explain them all, but I encourage you to read the Wikipedia article on [exponential notation](https://en.wikipedia.org/wiki/Exponential_notation), which is the basis of floating point. In particular, the exponent field isn't "evil"; it's absolutely fundamental to the whole scheme! And it applies all the time, not just after we get past 9007199254740991. – Steve Summit Apr 07 '23 at 20:03
-1

Consider your compiler, which may not follow the current IEEE 754 Double Type specification. Here is a revised snippet to try in VB6 or in Excel VBA. It exits the loop at 999,999,999,999,999 which is only 1/9 the expected value. This doesn't test all numbers, so there may be a lower number where an increment by 1 does not increment the sum. You can also try the following line in the debug window: Print Format(1E15# + 1#,"#,###")

    Microsoft VB6, Microsoft Excel 2013 VBA (Both obsolete) 
    Sub TestDbl()
    Dim dSum    As Double      'Double Precision Sum
    Dim vSum    As Variant     'Decimal Precision Sum
    Dim vSumL   As Variant     'Last valid comparison
   
    Dim dStep   As Double
    Dim vStep   As Variant
   
    dStep = 2# ^ 49#           'Starting step
    vStep = CDec(dStep)
   
    dSum = dStep               'Starting Sums
    vSum = vStep
    vSumL = vSum
   
   
    Debug.Print Format(dSum, "###,###,###,###,###,###,###"); " "; _
                Format(vSum, "###,###,###,###,###,###,###"); " "; _
                vStep; " "; Now()
    Do
       dSum = dSum + dStep     'Increment Sums
       vSum = CDec(vSum + vStep)
                              
       If dSum <> vSum Then
                              'Print bad steps
          Debug.Print Format(dSum, "###,###,###,###,###,###,###"); " "; _
                      Format(vSum, "###,###,###,###,###,###,###"); " "; _ 
                      vStep; " "; Now()
                              'Go back 2 steps
          vSum = CDec(vSumL - vStep)
          dSum = CDbl(vSum)
                              'Exit if Step is 1
          If dStep < 2 Then Exit Do
                              'Adjust Step, if <1 make 1
          vStep = CDec(Int(vStep / 4))
          If vStep < 2 Then vStep = CDec(1)
          dStep = CDbl(vStep)
       End If                  'End check for matching sums
       vSumL = vSum            'Last Valid reading
       DoEvents
    Loop                       'Take another step
                               'Last Valid step
    Debug.Print Format(dSum, "###,###,###,###,###,###,###"); " "; _
                Format(vSum, "###,###,###,###,###,###,###"); " ";  _
                vStep; " "; Now()
   
    End Sub
Oscar
  • 1
  • 1
  • No, it doesn't depend on the compiler. The question asked about "IEEE 754 double type", which is a standard, well-specified, language- and compiler-independent specification. – Steve Summit Apr 07 '23 at 19:49
  • You are correct. I updated the answer and changed code to a more efficient version. – Oscar Apr 08 '23 at 14:12
  • The original poster asked about IEEE754 double type, but had a code snippet that indicated a loop and asked when the loop would exit. – Oscar Apr 08 '23 at 17:06
-3

UPDATE 1 :

just realized 5 ^ 1074 is NOT the true upper limit of what you can get for free out of IEEE 754 double-precision floating point, because I only counted denormalized exponents and forgot the fact the mantissa itself can fit another 22 powers of 5, so to the best of my understanding, the largest power of 5 one can get for free out of the double-precision format is ::

largest power of 5 :

  • 5 ^ 1096

largest odd number :

  • 5 ^ 1074 x 9007199254740991

  • 5 ^ 1074 x ( 2 ^ 53 - 1 )

mawk 'BEGIN { OFS = "\f\r\t";

 CONVFMT = "IEEE754 :: 4-byte word :: %.16lX"; 
   
 print "", 
 sprintf("%.*g", __=(_+=_+=_^=_<_)^++_+_*(_+_),
                ___=_=((_+_)/_)^-__),   (_ ""),
                        \
 sprintf("%.*g",__,_=_*((_+=(_^=!_)+(_+=_))*_\
                           )^(_+=_++)), (_ ""),
                           \
 sprintf("%.*g",__,_=___*=  \
        (_+=_+=_^=_<_)^--_^_/--_-+--_), (_ "") }'
  • 4.940656458412465441765687928682213723650598026143247644255856825006755072702087518652998363616359923797965646954457177309266567103559397963987747960107818781263007131903114045278458171678489821036887186360569987307230500063874091535649843873124733972731696151400317153853980741262385655911710266585566867681870395603106249319452715914924553293054565444011274801297099995419319894090804165633245247571478690147267801593552386115501348035264934720193790268107107491703332226844753335720832431936092382893458368060106011506169809753078342277318329247904982524730776375927247874656084778203734469699533647017972677717585125660551199131504891101451037862738167250955837389733598993664809941164205702637090279242767544565229087538682506419718265533447265625e-324

      — IEEE754 :: 4-byte word :: 0000000000000001
    
    494065645841246544176568792......682506419718265533447265625 } 751 dgts :      
      5^1,074    
    
  • 1.1779442926436580280698985883431944188238616052015418158187524855152976686244219586021896275559329804892458073984282439492384355315111632261247033977765604928166883306272301781841416768261169960586755720044541328685833215865788678015827760393916926318959465387821953663477851727634395732669139543975751084522891987808004020022041120326339133484493650064495265010111570347355174765803347028811562651566216206901711944564705815590623254860079132843479610128658074120767908637153514231969910697784644086106916351461663273587631725676246505444808791274797874748064938487833137213363849587926231550453981511635715193075144590522172925785791614297511667878003519179715722536405560955202126362715257889359212587458533154881546706053453699158950485070818103849887847900390625e-308

      — IEEE754 :: 4-byte word :: 000878678326EAC9
    
    117794429264365802806989858......070818103849887847900390625 } 767 dgts :
      5^1,096
    
  • 4.4501477170144022721148195934182639518696390927032912960468522194496444440421538910330590478162701758282983178260792422137401728773891892910553144148156412434867599762821265346585071045737627442980259622449029037796981144446145705102663115100318287949527959668236039986479250965780342141637013812613333119898765515451440315261253813266652951306000184917766328660755595837392240989947807556594098101021612198814605258742579179000071675999344145086087205681577915435923018910334964869420614052182892431445797605163650903606514140377217442262561590244668525767372446430075513332450079650686719491377688478005309963967709758965844137894433796621993967316936280457084866613206797017728916080020698679408551343728867675409720757232455434770912461317493580281734466552734375e-308

      — IEEE754 :: 4-byte word :: 001FFFFFFFFFFFFF
    
    445014771701440227211481959......317493580281734466552734375 } 767 dgts :
          5^1,074
          6361
          69431
          20394401 
    

and here's a quick awk code snippet to print out every positive power of 2 up to 1023, every positive power of 5 up to 1096, and their common power of zero, optimized for both with and without a bigint library :

{m,g,n}awk' BEGIN {

 CONVFMT = "%." ((_+=_+=_^=_<_)*_+--_*_++)(!++_) "g"
    OFMT = "%." (_*_) "g"

 if (((_+=_+_)^_%(_+_))==(_)) {
    print __=_=\
            int((___=_+=_+=_*=++_)^!_)
     OFS = ORS
    while (--___) {
        print int(__+=__), int(_+=_+(_+=_))
    }
    __=((_+=_+=_^=!(__=_))^--_+_*_) substr("",_=__)
    do {
        print _+=_+(_+=_) } while (--__)
    exit
 } else { _=_<_ }

    __=((___=_+=_+=++_)^++_+_*(_+_--))
      _=_^(-(_^_--))*--_^(_++^_^--_-__)
  _____=-log(_<_)
    __^=_<_
   ___=-___+--___^___

 while (--___) {
     print ____(_*(__+=__+(__+=__))) }
 do {
     print ____(_) } while ((_+=_)<_____)
 }

 function ____(__,_) {
     return (_^=_<_)<=+__ \
     ?              sprintf( "%.f", __) \
     : substr("", _=sprintf("%.*g", (_+=++_)^_*(_+_),__),
         gsub("^[+-]*[0][.][0]*|[.]|[Ee][+-]?[[:digit:]]+$","",_))_
 }'

=============================

depends on how flexible you are with the definition of "represented" and "representable" -

Despite what typical literature says, the integer that's actually "largest" in IEEE 754 double precision, without any bigint library or external function call, with a completely full mantissa, that is computable, storable, and printable is actually :

9,007,199,254,740,991 * 5 ^ 1074 (~2546.750773909... bits)

  4450147717014402272114819593418263951869639092703291
  2960468522194496444440421538910330590478162701758282
  9831782607924221374017287738918929105531441481564124
  3486759976282126534658507104573762744298025962244902
  9037796981144446145705102663115100318287949527959668
  2360399864792509657803421416370138126133331198987655
  1545144031526125381326665295130600018491776632866075
  5595837392240989947807556594098101021612198814605258
  7425791790000716759993441450860872056815779154359230
  1891033496486942061405218289243144579760516365090360
  6514140377217442262561590244668525767372446430075513
  3324500796506867194913776884780053099639677097589658
  4413789443379662199396731693628045708486661320679701
  7728916080020698679408551343728867675409720757232455
  434770912461317493580281734466552734375

I used xxhash to compare this with gnu-bc and confirmed it's indeed identical and no precision lost. There's nothing "denormalized" about this number at all, despite the exponent range being labeled as such.

Try it on ur own system if u don't believe me. (I got this print out via off-the-shelf mawk) - and you can get to it fairly easily too :

  1. one(1) exponentiation/power (^ aka **) op,
  2. one(1) multiplication (*) op,
  3. one (1) sprintf() call, and
  4. either one(1) of — substr() or regex-gsub() to perform the cleanup necessary

Just like the 1.79…E309 number frequently mentioned,

  • both are mantissa limited
  • both are exponent limited
  • both have ridiculously large ULPs (unit in last place)
  • and both are exactly 1 step from "overwhelming" the floating point unit with either an overflow or underflow to give you back a usable answer

Negate the binary exponents of the workflow, and you can have the ops done entirely in this space, then just invert it once more at tail end of workflow to get back to the side what we typically consider "larger",

but keep in mind that in the inverted 
exponent realm, there's no "gradual overflow"

— The 4Chan Teller

RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
  • Why are you talking about powers of 5? On its face, this answer makes no sense. It looks like you're making some assumptions, can you explain those? – Steve Summit Apr 07 '23 at 20:33
  • @SteveSummit : the question was only what the biggest integer that could be stored, but it never said anything about how it must be downstream computable, nor that you must interpret the exponents exactly how `IEEE 754` tells you to. All negative powers of 2 are positive powers of 5, and all negative powers of 5 are positive powers of 2, decimal place shifted thats all. – RARE Kpop Manifesto Apr 08 '23 at 02:57
  • it's up to you regarding how to interpret what's being stored - it's the same reason why 4 bytes could be interpreted in so many ways - big endian, little endian, unsigned 32, signed 32, pair of signed 16, pair of unsigned 16, 4 signed chars, 4 unsigned 8-bit chars, a single precision floating point, RGB channels plus alpha, 4 characters of an `ASCII` string, or 1 `ASCII` plus 3 bytes in `UTF-8` etc - it's still the same 4 bytes, and how you like to interpret them is entirely up to u. so who says you can't do the same just because they label it `double precision floating point` ? – RARE Kpop Manifesto Apr 08 '23 at 03:11
  • - the base case would be reading it exactly as is - the BAU definition of the sign mantissa and exponent combo, where you would end up with it a large magnitude negative power of 2 to yield a tiny number, or you can interpret the exact same floating point as as a large magnitude positive power power of 5 that needs some decimal place shifting – RARE Kpop Manifesto Apr 08 '23 at 03:11
  • @SteveSummit : when starting with a large negative power of `2`, each decimal shifted towards the `radix` point cancels out one power of 2, replacing it with a positive power of `5`, so when it finally reaches exactly the integer level, now it's the same `sign and mantissa` against a positive power of `5`. once you're past 22 net positive powers of 5 it no longer can remain an odd integer if using 64-bit doubles (since `2^53` lies exactly between `5^22` and `5^23`)....... – RARE Kpop Manifesto Apr 08 '23 at 03:27
  • …... - so you can (1) spend the time manually doing arbitrary precision math, (2) you can use a bignum library of sorts, or (3) you can get it basically for free since any chip that implements `754 doubles` already performed the harder part in hardware (and likely very few cycles). I never understand why there are widespread notions that string ops are like the untouchable 3rd rail when in fact there are plenty of times their under-utilized advantages are hiding in plain sight. – RARE Kpop Manifesto Apr 08 '23 at 03:30
  • All negative powers of 2 are positive powers of 0.5, it's true. But I think you have to be a bit more clear about how the decimal point shifting is supposed to work. (I do take your point, though, about how there are many ways to interpret a bit pattern.) – Steve Summit Apr 08 '23 at 22:39
  • Bottom line, IEEE 754 doubles contain a significand and an exponent, interpreted as significand × 2^exp. It looks like you want to reinterpret these as significand × 5^(-exp). More precisely, while IEEE754 says that the bit pattern `0x001` in the exponent field corresponds to -1023, and `0x7fe` corresponds to +1023, you want to basically flip these around, and change the base from 2 to 5 while you're at it. – Steve Summit Apr 08 '23 at 22:39
  • You can do that if you want, if you don't care "anything about how it must be downstream computable". But I don't think you can do any meaningful arithmetic on numbers interpreted in that way, so I'm not sure how you "get it basically for free since any chip [...] already performs the harder part in hardware". – Steve Summit Apr 08 '23 at 22:39
  • @SteveSummit : "meaningful arithmetic" is still a point of view - run this ::: `for __ in 1; do echo 'b = 2; e = 971; m = 99983333; x = 1114111; ( b^e ) * m * x' | bc; echo $'2 971 99983333 1114111\n5 1069 99983333 1114111 ' | mawk 'function ______(_,__,___,____) { return (_=+_)*_==_^_ ? sprintf("%.f", _^__*___*____) : substr("",__ = sprintf("%+.*f",__, --_^-_^_--*--_^(_++^_^--_-__)*___*____), sub("^[+-0]+","",__))__ } BEGIN { CONVFMT = "%.250g" } ($++NF = ______($1, $2, $3, $4))^_' OFS='\n'; echo 'b = 5; e = 1069; m = 99983333; x = 1114111; ( b^e ) * m * x' | bc; done`. matches `bc` output – RARE Kpop Manifesto Apr 09 '23 at 09:00
  • @SteveSummit : and that's multiplying `2 prime integers` and scale it by either one of 2 `prime integer bases` against their corresponding `prime integer exponents`, including a power of 5 so large that it was operating in the `denormalized` range of `floating point`, and still getting a result digit for digit matching that of `bc`, and you wanna tell me that' s not "meaningful arithmetic" ? – RARE Kpop Manifesto Apr 09 '23 at 09:02
  • @SteveSummit : `for ____ in $$; do __='9899993'; ___='-889996669'; echo 'b = 2; e = 971; m = '"$__"'; x = '"$___"'; ( b^e ) * m * x' | bc; echo "2 971 $__ $___\n5 1069 $__ $___" | mawk 'function ______(_,__,___,____) { return (_=+_)*_==_^_ ? sprintf("%.f", _^__*___*____) : substr("",__ = sprintf("%+.*f",__, --_^-_^_--*--_^(_++^_^--_-__)*___*____), sub("^[+-0]+",substr("-",_^(__!~"^-")),__))__ } BEGIN { CONVFMT = "%.250g" } ($++NF = ______($1, $2, $3, $4))^_' OFS='\n'; echo 'b = 5; e = 1069; m = '"$__"'; x = '"$___"'; ( b^e ) * m * x' | bc; done` now i've match sign matching as well – RARE Kpop Manifesto Apr 09 '23 at 09:15
  • @SteveSummit : for the record the last one yielded `-13930217711156395187………6904795169830322265625`, or `-1.3930217711156395188e+763` in scientific notation – RARE Kpop Manifesto Apr 09 '23 at 09:17
  • Thank you for proving me wrong! Unfortunately I still have no clear idea how your method works. awk programs with variables named `_`, `__`, and `___` really don't make for the clearest explanation! Why not show some intermediate results? I gather you're actually computing fractions like 0.22232…58624, and stripping the decimal point off to get 22232…58624, but I don't see how you're constructing the alternate versions of 971, 99983333, and 1114111 to do math on to get the intermediate fractional results. – Steve Summit Apr 11 '23 at 13:08
  • @SteveSummit : define "alternate versions" ? every number there `2`, `5`, `971`, `1069`, `9899993`, and `889996669` is prime, so what kind of construction do you want to see with them ? – RARE Kpop Manifesto Apr 11 '23 at 13:21
  • @SteveSummit : as for intermediate fractional results, it's the output of `sprintf( )`. because `gnu-gawk` in some invocation modes give u a hard time when trying to go denormalized, so I split the negative exponents, to `2^-512`, or even cleaner, `4^-4^4`, times `2^(512-exponent)` for the rest. – RARE Kpop Manifesto Apr 11 '23 at 13:24
  • Well, I may have to take back what said about "proving me wrong". You're not doing any kind of general-purpose arithmetic on numbers bigger than what an IEEE754 double can hold. You're computing `a*b^e`, where `b` is 5. (Or 2.) That's cute, but it's not that interesting, especially when you hide it behind a bunch of pointless obfuscations like `(_=+_)*_==_^_` when you really just meant `b == 2`. – Steve Summit Apr 12 '23 at 13:39
  • @SteveSummit : define "interesting". The largest integer traditionally recognized (`9007199254740991`) can't allow you to subtract `0.5` from it and retain that information to full precision - is subtracting `0.5` not considered "interesting" in your book ? I always personally prefer dynamically generating necessary operands and offsets on the fly - do you remember how many security holes that existed stemming purely from neglecting to update the same constant to its new value across all pieces of code ? – RARE Kpop Manifesto Apr 12 '23 at 18:58
  • @SteveSummit : you're most likely not gonna believe this statement (and that's okay, since very few do anyway), but I actually write code with those `_` `__` even for myself - ever since i started going with this styling approach my own skills has seen a noticeable improvement, stemming from the fact it helps you think like a Turing machine. When you think like a Turing Machine, the resulting codes would naturally be efficient for a Turing Machine (a computer) - and that's without needing to write assembly or hand code `SIMD` vectorizations – RARE Kpop Manifesto Apr 12 '23 at 19:14
  • @SteveSummit : labeling (of anything) leads to subconscious bias, stereotyping, and judgment based on predisposed notions conforming to expectations one associates with the label. same with gender bias, same with racial bias, same with coding. Even using single letters of `a` `b` `x` `y` `e` in variable names are a form of labeling - a mental barrier one placed upon themselves. Removal of any possibility for labeling with a single bit alphabet also removes that mental barrier, so one could see the flip sides of many coins one has previously missed due to that same mental barrier being in place – RARE Kpop Manifesto Apr 12 '23 at 19:22
  • Re "*it's up to you regarding how to interpret what's being stored*", If you invent your own format, you can have it support arbitrarily large numbers. For example, "In my format, the bit pattern 00..00 means 10^999999999999999." – ikegami Aug 06 '23 at 17:46
  • @ikegami : goood… now go find me a `SAN network` to store all those zeros. I didn't invent any new format - this is the same good ole `IEEE 754 double-precision binary FP` format just about everyone has. people are just overlooking what already exists for free - you can waste bigint time calculating `9007199254740881 * 5^1074`, or double-prec floating point ops give you all the digits ENTIRELY for free. It's the flip side of the exact same coin, that's all. – RARE Kpop Manifesto Aug 12 '23 at 15:37