94

Does a float have 32 binary digits and a double have 64 binary digits? The documentation was too hard to make sense of.

Do all of the bits translate to significant digits? Or does the location of the decimal point take up some of the bits?

Kartik Chugh
  • 1,104
  • 14
  • 28
Eamon Moloney
  • 1,853
  • 4
  • 19
  • 22
  • 2
    Do all of those bits translate to significant digits? Or does the location of the decimal point take up some of the bits? – Eamon Moloney Nov 24 '12 at 16:14
  • @user1774214 floating point numbers aren't at all encoded like integers. have a look at the link I give. You must understand, for example, that the precision isn't uniform. – Denys Séguret Nov 24 '12 at 16:14
  • @dystroy I'm not sure what you mean by “the precision isn't uniform”. It is pretty uniformly 53 and 24 bits of precision, unless you are referring to denormals. – Pascal Cuoq Jun 20 '14 at 08:55
  • 3
    @PascalCuoq there's more precision for smaller numbers. As the exponent changes (or the point floats) around, the mantissa keeps representing the same amount of digits. So if the number is big, the mantissa "can't reach" lower significant digits as much, thus giving less precision. – Victor Basso Mar 23 '15 at 13:54
  • 3
    @Virtuel The precision is 53 bits. **That** is what we call the precision. You appear to be thinking of the *absolute accuracy* or something. – Pascal Cuoq Mar 23 '15 at 13:57
  • @Virtuel You'll find this quote, “p is the precision (the number of digits in the significand)”, defining the word *precision*, on http://en.wikipedia.org/wiki/Floating_point – Pascal Cuoq Mar 23 '15 at 14:02
  • @PascalCuoq: He is probably referring to precision in decimals. The precision with which 1.0000 can be represented by a float is higher than the precision with which 9.0000 can be represented, because the float for 9 needs a larger exponent. In the decimal system the exponent is the same, making this non-intuitive. This is also why the precision of a float is *about* 7 decimal digits. The number of significant decimal digits depends on the magnitude of the first digit(s). – EvertW Jan 29 '19 at 09:52
  • @EvertW You too are confusing precision with range. – user207421 Jun 20 '22 at 06:50

6 Answers6

128

float: 32 bits (4 bytes) where 23 bits are used for the mantissa (about 7 decimal digits). 8 bits are used for the exponent, so a float can “move” the decimal point to the right or to the left using those 8 bits. Doing so avoids storing lots of zeros in the mantissa as in 0.0000003 (3 × 10-7) or 3000000 (3 × 107). There is 1 bit used as the sign bit.

double: 64 bits (8 bytes) where 52 bits are used for the mantissa (about 16 decimal digits). 11 bits are used for the exponent and 1 bit is the sign bit.

Since we are using binary (only 0 and 1), one bit in the mantissa is implicitly 1 (both float and double use this trick) when the number is non-zero.

Also, since everything is in binary (mantissa and exponents) the conversions to decimal numbers are usually not exact. Numbers like 0.5, 0.25, 0.75, 0.125 are stored exactly, but 0.1 is not. As others have said, if you need to store cents precisely, do not use float or double, use int, long, BigInteger or BigDecimal.

Sources:

http://en.wikipedia.org/wiki/Floating_point#IEEE_754:_floating_point_in_modern_computers

http://en.wikipedia.org/wiki/Binary64

http://en.wikipedia.org/wiki/Binary32

Régis Jean-Gilles
  • 32,541
  • 5
  • 83
  • 97
marcus
  • 5,041
  • 3
  • 30
  • 36
  • what do you mean 6 to 9? how can it change? so if I run some code which has 8 decimal digits like 0.000000001 multiple times, i'll get different results? is that what you mean? – Aequitas Oct 28 '15 at 02:58
  • 2
    Some numbers can be represented more exactly in binary than others. You can see the difference in 0.125 (1/8, eight is a power of two) and 0.1 (1/10, ten is not a power of two). The former has more (decimal) digits, but is represented exactly. So it could be that a number with 6 decimal digits has larger rounding errors than another number with 8 digits. – marcus Oct 28 '15 at 12:39
  • @marcus if **float** has **8** bits using which it can move the decimal point and **double** has **11** such bits, then does that imply that a number stored in non-exponential form, for eg. 65.235899... can be stored with 8 places after decimal in float and 11 places in double?? –  Mar 09 '16 at 16:42
  • 12
    15.9 decimal digits for `double` and 7.2 for `float`, which is to say 15 and 7. Some larger numbers can be represented in each case, and none of it applies to fractions, but there is no 'average' about it, and none of your sources says otherwise. – user207421 Mar 09 '16 at 23:49
  • 1
    If you don't like the word average, propose an edit. It was not added by me in the first place, it was edited by someone else... (and I really didn't see the need for that edit). – marcus Dec 15 '16 at 15:14
  • 8
    Interestingly, there is actually one more digit of precision than stored in the mantissa/significand. 23 and 52 bits are stored for float and double, respectively but because the numbers are normalized we can assume a leading 1-bit, then leave it out. This is why the effective precision is 24 and 53 bits, respectively. The precise decimal precisions are calculated log10(2^24) = 7.22 and log10(2^53) = 15.95 – Georgie Nov 08 '18 at 21:58
46

A 32-bit float has about 7 digits of precision and a 64-bit double has about 16 digits of precision

Long answer:

Floating-point numbers have three components:

  1. A sign bit, to determine if the number is positive or negative.
  2. An exponent, to determine the magnitude of the number.
  3. A fraction, which determines how far between two exponent values the number is. This is sometimes called “the significand, mantissa, or coefficient”

Essentially, this works out to sign * 2^exponent * (1 + fraction). The “size” of the number, it’s exponent, is irrelevant to us, because it only scales the value of the fraction portion. Knowing that log₁₀(n) gives the number of digits of n,† we can determine the precision of a floating point number with log₁₀(largest_possible_fraction). Because each bit in a float stores 2 possibilities, a binary number of n bits can store a number up to 2ⁿ - 1 (a total of 2ⁿ values where one of the values is zero). This gets a bit hairier, because it turns out that floating point numbers are stored with one less bit of fraction than they can use, because zeroes are represented specially and all non-zero numbers have at least one non-zero binary bit.‡

Combining this, the digits of precision for a floating point number is log₁₀(2ⁿ), where n is the number of bits of the floating point number’s fraction. A 32-bit float has 24 bits of fraction for ≈7.22 decimal digits of precision, and a 64-bit double has 53 bits of fraction for ≈15.95 decimal digits of precision.

For more on floating point accuracy, you might want to read about the concept of a machine epsilon.


† For n ≥ 1 at least — for other numbers your formula will look more like ⌊log₁₀(|n|)⌋ + 1.

‡ “This rule is variously called the leading bit convention, the implicit bit convention, or the hidden bit convention.” (Wikipedia)

Community
  • 1
  • 1
9999years
  • 1,561
  • 13
  • 14
17

From java specification :

The floating-point types are float and double, which are conceptually associated with the single-precision 32-bit and double-precision 64-bit format IEEE 754 values and operations as specified in IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985 (IEEE, New York).

As it's hard to do anything with numbers without understanding IEEE754 basics, here's another link.

It's important to understand that the precision isn't uniform and that this isn't an exact storage of the numbers as is done for integers.

An example :

double a = 0.3 - 0.1;
System.out.println(a);          

prints

0.19999999999999998

If you need arbitrary precision (for example for financial purposes) you may need Big Decimal.

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
7

A normal math answer.

Understanding that a floating point number is implemented as some bits representing the exponent and the rest, most for the digits (in the binary system), one has the following situation:

With a high exponent, say 10²³ if the least significant bit is changed, a large difference between two adjacent distinghuishable numbers appear. Furthermore the base 2 decimal point makes that many base 10 numbers can only be approximated; 1/5, 1/10 being endless numbers.

So in general: floating point numbers should not be used if you care about significant digits. For monetary amounts with calculation, e,a, best use BigDecimal.

For physics floating point doubles are adequate, floats almost never. Furthermore the floating point part of processors, the FPU, can even use a bit more precission internally.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
4

Floating point numbers are encoded using an exponential form, that is something like m * b ^ e, i.e. not like integers at all. The question you ask would be meaningful in the context of fixed point numbers. There are numerous fixed point arithmetic libraries available.

Regarding floating point arithmetic: The number of decimal digits depends on the presentation and the number system. For example there are numbers (e.g.: 1/5) which have a finite presentation in decimal but do not have one in binary; every finite binary fraction is also finite in decimal (because binary has base 2, and decimal has base 2*5).

Also it is worth mentioning that floating point numbers up to a certain point do have a difference larger than one, i.e. value + 1 yields value, since value + 1 can not be encoded using m * b ^ e, where m, b and e are fixed in length. The same happens for values smaller than 1, i.e. all the possible code points do not have the same distance.

Because of this there is no precision of exactly n digits like with fixed point numbers, since not every number with n decimal digits does have a IEEE encoding.

There is a nearly obligatory document which you should read then which explains floating point numbers: What every computer scientist should know about floating point arithmetic.

Johan
  • 74,508
  • 24
  • 191
  • 319
scravy
  • 11,904
  • 14
  • 72
  • 127
  • 2
    +1 for mentioning "What every computer scientist should know about floating point arithmetic". However, it is worth noting that **every** number that has a finite binary fraction representation also has a finite decimal representation. The problem is only going from decimal to binary. – Patricia Shanahan Nov 24 '12 at 18:38
1

Look at Float.intBitsToFloat and Double.longBitsToDouble, which sort of explain how bits correspond to floating-point numbers. In particular, the bits of a normal float look something like

 s * 2^exp * 1.ABCDEFGHIJKLMNOPQRSTUVW

where A...W are 23 bits -- 0s and 1s -- representing a fraction in binary -- s is +/- 1, represented by a 0 or a 1 respectively, and exp is a signed 8-bit integer.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413