Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions
526
votes
14 answers

What is the difference between float and double?

I've read about the difference between double precision and single precision. However, in most cases, float and double seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are…
VaioIsBorn
  • 7,683
  • 9
  • 31
  • 28
352
votes
12 answers

What is the rationale for all comparisons returning false for IEEE754 NaN values?

Why do comparisons of NaN values behave differently from all other values? That is, all comparisons with the operators ==, <=, >=, <, > where one or both values is NaN returns false, contrary to the behaviour of all other values. I suppose this…
starblue
  • 55,348
  • 14
  • 97
  • 151
305
votes
3 answers

Why does NaN - NaN == 0.0 with the Intel C++ Compiler?

It is well-known that NaNs propagate in arithmetic, but I couldn't find any demonstrations, so I wrote a small test: #include #include int main(int argc, char* argv[]) { float qNaN = std::numeric_limits::quiet_NaN(); …
geometrian
  • 14,775
  • 10
  • 56
  • 132
295
votes
11 answers

Biggest integer that can be stored in a double

What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision? In other words, at would the follow code fragment return: UInt64 i = 0; Double d = 0; while (i == d) { i += 1; d +=…
Franck Freiburger
  • 26,310
  • 20
  • 70
  • 95
252
votes
9 answers

Float and double datatype in Java

The float data type is a single-precision 32-bit IEEE 754 floating point and the double data type is a double-precision 64-bit IEEE 754 floating point. What does it mean? And when should I use float instead of double or vice-versa?
Leo
  • 5,017
  • 6
  • 32
  • 55
228
votes
2 answers

Which is the first integer that an IEEE 754 float is incapable of representing exactly?

For clarity, if I'm using a language that implements IEE 754 floats and I declare: float f0 = 0.f; float f1 = 1.f; ...and then print them back out, I'll get 0.0000 and 1.0000 - exactly. But IEEE 754 isn't capable of representing all the numbers…
Floomi
  • 2,503
  • 2
  • 16
  • 10
164
votes
4 answers

Why does the floating-point value of 4*0.1 look nice in Python 3 but 3*0.1 doesn't?

I know that most decimals don't have an exact floating point representation (Is floating point math broken?). But I don't see why 4*0.1 is printed nicely as 0.4, but 3*0.1 isn't, when both values actually have ugly decimal representations: >>>…
Aivar
  • 6,814
  • 5
  • 46
  • 78
163
votes
6 answers

Why is NaN not equal to NaN?

The relevant IEEE standard defines a numeric constant NaN (not a number) and prescribes that NaN should compare as not equal to itself. Why is that? All the languages I'm familiar with implement this rule. But it often causes significant problems,…
max
  • 49,282
  • 56
  • 208
  • 355
160
votes
10 answers

Is floating-point math consistent in C#? Can it be?

No, this is not another "Why is (1/3.0)*3 != 1" question. I've been reading about floating-points a lot lately; specifically, how the same calculation might give different results on different architectures or optimization settings. This is a…
BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
152
votes
2 answers

What is the difference between quiet NaN and signaling NaN?

I have read about floating-point and I understand that NaN could result from operations. But I can't understand what these are concepts exactly. What is the difference between them? Which one can be produced during C++ programming? As a programmer,…
JalalJaberi
  • 2,417
  • 8
  • 25
  • 41
132
votes
12 answers

Is it possible to get 0 by subtracting two unequal floating point numbers?

Is it possible to get division by 0 (or infinity) in the following example? public double calculation(double a, double b) { if (a == b) { return 0; } else { return 2 / (a - b); } } In normal cases it…
Thirler
  • 20,239
  • 14
  • 63
  • 92
121
votes
3 answers

What is a subnormal floating point number?

The isnormal() reference page says: Determines if the given floating point number arg is normal, i.e. is neither zero, subnormal, infinite, nor NaN. It's clear what a number being zero, infinite or NaN means. But it also says subnormal. When is a…
BЈовић
  • 62,405
  • 41
  • 173
  • 273
120
votes
3 answers

Type-juggling and (strict) greater/lesser-than comparisons in PHP

PHP is famous for its type-juggling. I must admit it puzzles me, and I'm having a hard time to find out basic logical/fundamental things in comparisons. For example: If $a > $b is true and $b > $c is true, must it mean that $a > $c is always true…
hakre
  • 193,403
  • 52
  • 435
  • 836
97
votes
4 answers

Why does (inf + 0j)*1 evaluate to inf + nanj?

>>> (float('inf')+0j)*1 (inf+nanj) Why? This caused a nasty bug in my code. Why isn't 1 the multiplicative identity, giving (inf + 0j)?
marnix
  • 1,210
  • 8
  • 12
85
votes
7 answers

What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems?

I know a little bit about how floating-point numbers are represented, but not enough, I'm afraid. The general question is: For a given precision (for my purposes, the number of accurate decimal places in base 10), what range of numbers can be…
Nate Parsons
  • 14,431
  • 13
  • 51
  • 67
1
2 3
96 97