Questions tagged [ieee-754]

IEEE 754 is the most common & widely used floating-point standard, notably the single-precision binary32 aka float and double-precision binary64 aka double formats.

IEEE 754 is the Institute of Electrical and Electronics Engineers standard for floating-point computation, and is the most common & widely used implementation thereof.

Wikipedia on IEEE 754 (2008)
ieee.org documentation
https://en.wikipedia.org/wiki/Single-precision_floating-point_format aka binary32, usually called float or real4. Nice diagrams of the bit-pattern, and range over which it can represent every integer exactly, and so on.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format usually called double or real8
Algorithm to convert an IEEE 754 double to a string? including the recent Ryū: fast float-to-string conversion

As well as formats, IEEE754 also defines the basic operations, + - * / and sqrt, as producing correctly-rounded results (error <= 0.5ulp). Other functions like pow and sin are not required to be as accurate; that's an implementation choice between precision and performance.

This is why many CPU instruction sets only include the basic operations (including sqrt).

1447 questions

526

votes

14 answers

What is the difference between float and double?

I've read about the difference between double precision and single precision. However, in most cases, float and double seem to be interchangeable, i.e. using one or the other does not seem to affect the results. Is this really the case? When are…

asked Mar 05 '10 at 12:48

VaioIsBorn

7,683
9
31
28

352

votes

12 answers

What is the rationale for all comparisons returning false for IEEE754 NaN values?

Why do comparisons of NaN values behave differently from all other values? That is, all comparisons with the operators ==, <=, >=, <, > where one or both values is NaN returns false, contrary to the behaviour of all other values. I suppose this…

floating-point comparison nan ieee-754 iec10967

asked Oct 14 '09 at 09:19

starblue

55,348
14
97
151

305

votes

3 answers

Why does NaN - NaN == 0.0 with the Intel C++ Compiler?

It is well-known that NaNs propagate in arithmetic, but I couldn't find any demonstrations, so I wrote a small test: #include #include int main(int argc, char* argv[]) { float qNaN = std::numeric_limits::quiet_NaN(); …

c++ c floating-point ieee-754 icc

asked Aug 25 '15 at 05:11

geometrian

14,775
10
56
132

295

votes

11 answers

Biggest integer that can be stored in a double

What is the biggest "no-floating" integer that can be stored in an IEEE 754 double type without losing precision? In other words, at would the follow code fragment return: UInt64 i = 0; Double d = 0; while (i == d) { i += 1; d +=…

types floating-point double ieee-754

asked Dec 04 '09 at 18:12

Franck Freiburger

26,310
20
70
95

252

votes

9 answers

Float and double datatype in Java

The float data type is a single-precision 32-bit IEEE 754 floating point and the double data type is a double-precision 64-bit IEEE 754 floating point. What does it mean? And when should I use float instead of double or vice-versa?

java floating-point double ieee-754

asked Dec 22 '14 at 07:11

Leo

5,017
6
32
55

228

votes

2 answers

Which is the first integer that an IEEE 754 float is incapable of representing exactly?

For clarity, if I'm using a language that implements IEE 754 floats and I declare: float f0 = 0.f; float f1 = 1.f; ...and then print them back out, I'll get 0.0000 and 1.0000 - exactly. But IEEE 754 isn't capable of representing all the numbers…

types floating-point ieee-754

asked Sep 25 '10 at 12:40

Floomi

2,503
2
16
10

164

votes

4 answers

Why does the floating-point value of 40.1 look nice in Python 3 but 30.1 doesn't?

I know that most decimals don't have an exact floating point representation (Is floating point math broken?). But I don't see why 4*0.1 is printed nicely as 0.4, but 3*0.1 isn't, when both values actually have ugly decimal representations: >>>…

python floating-point rounding floating-accuracy ieee-754

asked Sep 21 '16 at 14:07

Aivar

6,814
5
46
78

163

votes

6 answers

Why is NaN not equal to NaN?

The relevant IEEE standard defines a numeric constant NaN (not a number) and prescribes that NaN should compare as not equal to itself. Why is that? All the languages I'm familiar with implement this rule. But it often causes significant problems,…

floating-point language-agnostic nan ieee-754

asked Apr 05 '12 at 18:43

max

49,282
56
208
355

160

votes

10 answers

Is floating-point math consistent in C#? Can it be?

No, this is not another "Why is (1/3.0)*3 != 1" question. I've been reading about floating-points a lot lately; specifically, how the same calculation might give different results on different architectures or optimization settings. This is a…

c# .net floating-point precision ieee-754

asked Jul 13 '11 at 17:29

BlueRaja - Danny Pflughoeft

84,206
33
197
283

152

votes

2 answers

What is the difference between quiet NaN and signaling NaN?

I have read about floating-point and I understand that NaN could result from operations. But I can't understand what these are concepts exactly. What is the difference between them? Which one can be produced during C++ programming? As a programmer,…

floating-point nan ieee-754

asked Aug 08 '13 at 05:19

JalalJaberi

2,417
8
25
41

132

votes

12 answers

Is it possible to get 0 by subtracting two unequal floating point numbers?

Is it possible to get division by 0 (or infinity) in the following example? public double calculation(double a, double b) { if (a == b) { return 0; } else { return 2 / (a - b); } } In normal cases it…

floating-point double floating-accuracy ieee-754

asked Feb 12 '15 at 09:55

Thirler

20,239
14
63
92

121

votes

3 answers

What is a subnormal floating point number?

The isnormal() reference page says: Determines if the given floating point number arg is normal, i.e. is neither zero, subnormal, infinite, nor NaN. It's clear what a number being zero, infinite or NaN means. But it also says subnormal. When is a…

c++ c++11 floating-point ieee-754

asked Dec 01 '11 at 12:28

BЈовић

62,405
41
173
273

120

votes

3 answers

Type-juggling and (strict) greater/lesser-than comparisons in PHP

PHP is famous for its type-juggling. I must admit it puzzles me, and I'm having a hard time to find out basic logical/fundamental things in comparisons. For example: If $a > $b is true and $b > $c is true, must it mean that $a > $c is always true…

php if-statement comparison logic ieee-754

asked Apr 04 '13 at 14:02

hakre

193,403
52
435
836

votes

4 answers

Why does (inf + 0j)*1 evaluate to inf + nanj?

>>> (float('inf')+0j)*1 (inf+nanj) Why? This caused a nasty bug in my code. Why isn't 1 the multiplicative identity, giving (inf + 0j)?

python nan ieee-754

asked Sep 20 '19 at 16:15

marnix

1,210
8
12

votes

7 answers

What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems?

I know a little bit about how floating-point numbers are represented, but not enough, I'm afraid. The general question is: For a given precision (for my purposes, the number of accurate decimal places in base 10), what range of numbers can be…

floating-point precision numerical ieee-754

asked May 16 '09 at 14:37

Nate Parsons

14,431
13
51
67

2 3

…

96 97 Next