12

It's kind of a common knowledge that (most) floating point numbers are not stored precisely (when IEEE-754 format is used). So one shouldn't do this:

0.3 - 0.2 === 0.1; // very wrong

... as it will result in false, unless some specific arbitrary-precision type/class was used (BigDecimal in Java/Ruby, BCMath in PHP, Math::BigInt/Math::BigFloat in Perl, to name a few) instead.

Yet I wonder why when one tries to print the result of this expression, 0.3 - 0.2, scripting languages (Perl and PHP) give 0.1, but "virtual-machine" ones (Java, JavaScript and Erlang) give something more similar to 0.09999999999999998 instead?

And why is it also inconsistent in Ruby? version 1.8.6 (codepad) gives 0.1, version 1.9.3 (ideone) gives 0.0999...

Amelia
  • 2,967
  • 2
  • 24
  • 39
raina77ow
  • 103,633
  • 15
  • 192
  • 229
  • 1
    Are you sure that's not a printing function issue? Many languages (e.g. C++ with `ostream` classes) by default perform some rounding when converting a FP number to string. – Matteo Italia Dec 29 '12 at 14:12
  • Isn't the difference `double`/`float` rather than `virtual-machine`/`compiled to native code`? – John Dvorak Dec 29 '12 at 14:12
  • @MatteoItalia I don't think it's related to _printing_ only: converting (0.3 - 0.2) to string gives 0.1 both in PHP and Perl. Yet you may be right, that may be an issue of `sprintf` (or some similar function). – raina77ow Dec 29 '12 at 14:14
  • @raina77ow: that's exactly my point... I'm no PHP expert, but I wouldn't be surprised if a conversion to string performs some kind of rounding in the output. ***edit***: QED, see *dev-null-dweller* 's answer. – Matteo Italia Dec 29 '12 at 14:16

5 Answers5

7

As for php, output is related to ini settings of precision:

ini_set('precision', 15);
print 0.3 - 0.2; // 0.1

ini_set('precision', 17);
print 0.3 - 0.2; //0.099999999999999978 

This may be also cause for other languages

dev-null-dweller
  • 29,274
  • 3
  • 65
  • 85
  • 10
    My goodness... conversion to string of a FP number is controlled by an INI setting? What were they smoking? – Matteo Italia Dec 29 '12 at 14:19
  • 2
    And what's wrong with giving users ability to do it in one place instead of using `round / ceil / floor / number_format` every time you want to display FP number? – dev-null-dweller Dec 29 '12 at 14:41
  • 7
    Global setting for local problems = useless at least, recipe for disaster at worst. If it's an unknown setting, code will be written assuming blindly the default value, and will break whenever someone changes it. If, instead, it's routinely set to different values on different installations, your code will have to work around the settings whenever it needs something different (`magic_quotes_gpc`, anyone?); so, it only complicates the code. The normal way to approach the problem is to have a fixed default (guaranteed by the language specification) and provide some way to *locally* tweak it. – Matteo Italia Dec 29 '12 at 14:54
  • Well it's fixed in the engine, and both ini file and `ini_set` are meant to tweak it for user needs, just as you said. – dev-null-dweller Dec 29 '12 at 15:18
  • 2
    @Hiroto: and then the library your script is using will stop working, because it expected that almost-unknown setting to be the default. Solution? Before each call to the library you have to restore the setting to the default - or avoid customizing it in first place since it's more pain than benefit. Again, global state for language features is almost always a bad idea, I've been bitten by this stuff enough times. – Matteo Italia Dec 29 '12 at 15:18
  • 1
    @Matteo Italia: one might say the whole PHP thing is a coincidence rather than a design :). Nevertheless, it may not be a good idea to rely on *any* implicit conversion of a float to string, if you want a consistent result. They are approximate by design. That' why BigDecimal & Co. exist in the first place. – full.stack.ex Dec 29 '12 at 19:18
  • @MatteoItalia It's there for practicality. Number of decimal places is a common setting all over - [even in calculators](http://www.tvmcalcs.com/calculators/ti83/ti83_page1) (see under "Initial Setup"). – Izkata Dec 29 '12 at 20:10
4

Floating-point numbers are printed differently because printing is done for different purposes, so different choices are made about how to do it.

Printing a floating-point number is a conversion operation: A value encoded in an internal format is converted to a decimal numeral. However, there are choices about the details of the conversion.

(A) If you are doing precise mathematics and want to see the actual value represented by the internal format, then the conversion must be exact: It must produce a decimal numeral that has exactly the same value as the input. (Each floating-point number represents exactly one number. A floating-point number, as defined in the IEEE 754 standard, does not represent an interval.) At times, this may require producing a very large number of digits.

(B) If you do not need the exact value but do need to convert back and forth between the internal format and decimal, then you need to convert it to a decimal numeral precisely (and accurately) enough to distinguish it from any other result. That is, you must produce enough digits that the result is different from what you would get by converting numbers that are adjacent in the internal format. This may require producing a large number of digits, but not so many as to be unmanageable.

(C) If you only want to give the reader a sense of the number, and do not need to produce the exact value in order for your application to function as desired, then you only need to produce as many digits as are needed for your particular application.

Which of these should a conversion do?

Different languages have different defaults because they were developed for different purposes, or because it was not expedient during development to do all the work necessary to produce exact results, or for various other reasons.

(A) requires careful code, and some languages or implementations of them do not provide, or do not guarantee to provide, this behavior.

(B) is required by Java, I believe. However, as we saw in a recent question, it can have some unexpected behavior. (65.12 is printed as “65.12” because the latter has enough digits to distinguish it from nearby values, but 65.12-2 is printed as “63.120000000000005” because there is another floating-point value between it and 63.12, so you need the extra digits to distinguish them.)

(C) is what some languages use by default. It is, in essence, wrong, since no single value for how many digits to print can be suitable for all applications. Indeed, we have seen over decades that it fosters continuing misconceptions about floating-point, largely by concealing the true values involved. It is, however, easy to implement, and hence is attractive to some implementors. Ideally, a language should by default print the correct value of a floating-point number. If fewer digits are to be displayed, the number of digits should be selected only by the application implementor, hopefully including consideration of the appropriate number of digits to produce the desire results.

Worse, some languages, in addition to not displaying the actual value or enough digits to distinguish it, do not even guarantee that the digits produced are correct in some sense (such as being the value you would get by rounding the exact value to the number of digits shown). When programming in an implementation that does not provide a guarantee about this behavior, you are not doing engineering.

Community
  • 1
  • 1
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • I would slightly change case (C), or add a case (D), delivering final results to an end user. The output should be limited to digits that are expected to be correct and useful in the context of the application. Obviously, this cannot be done as a default, because it depends on the precision of the inputs and the numerical properties of the calculations, as well as the intended use of the data. – Patricia Shanahan Dec 30 '12 at 18:27
  • @PatriciaShanahan: I am receptive to changing or bifurcating case (C), but I am unclear on the distinction you want to make. The initial list describes the cases, with (C) being producing only some of the digits. I think you may be addressing the second list, which discusses further properties or purposes of the cases, and perhaps making a distinction between a language guessing at a good number of digits to use and a specific application developer planning a good number of digits to use. Could you clarify? – Eric Postpischil Dec 31 '12 at 14:00
  • I was thinking mainly about the purpose of limiting the printed digits. Selecting the number of digits for delivering results to the end user is not just a matter of giving the reader a sense of the number. It is picking the digits that will actually be used, the digits that are the ultimate purpose of the whole computation. I felt that the description of C seemed far too casual for something of that extreme importance. – Patricia Shanahan Dec 31 '12 at 15:01
2

PHP automatically rounds the number to an arbitrary precision.

Floating-point numbers in general aren't accurate (as you noted), and you should use the language-specific round() function if you need a comparison with only a few decimal places. Otherwise, take the absolute value of the equation, and test they are within a given range.

PHP Example from php.net:

$a = 1.23456789;
$b = 1.23456780;
$epsilon = 0.00001;
if(abs($a - $b) < $epsilon) {
  echo "true";
}

As for the Ruby issue, they appear to be using different versions. Codepad uses 1.8.6, While Ideaone uses 1.9.3, but it's more likely related to a config somewhere.

Amelia
  • 2,967
  • 2
  • 24
  • 39
2

If we want this property

  • every two different float has a different printed representation

Or an even stronger one useful for REPL

  • printed representation shall be re-interpreted unchanged

Then I see 3 solutions for printing a float/double with base 2 internal representation into base 10

  1. print the EXACT representation.
  2. print enough decimal digits (with proper rounding)
  3. print the shortest decimal representation that can be reinterpreted unchanged

Since in base two, the float number is an_integer * 2^an_exponent, its base 10 exact representation has a finite number of digits.
Unfortunately, this can result in very long strings... For example 1.0e-10 is represented exactly as 1.0000000000000000364321973154977415791655470655996396089904010295867919921875e-10

Solution 2 is easy, you use printf with 17 digits for IEEE-754 double...
Drawback: it's not exact, nor the shortest! If you enter 0.1, you get 0.100000000000000006

Solution 3 is the best one for REPL languages, if you enter 0.1, it prints 0.1
Unfortunately it is not found in standard libraries (a shame).
At least, Scheme, Python and recent Squeak/Pharo Smalltalk do it right, I think Java too.

aka.nice
  • 9,100
  • 1
  • 28
  • 40
0

As for Javascript, base2 is being used internally for calculations.

> 0.2 + 0.4
0.6000000000000001

For that, Javascript can only deliver even numbers, if the resulting base2 number is not periodic.

0.6 is 0.10011 10011 10011 10011 ... in base2 (periodic), whereas 0.5 is not and therefore correctly printed.

David Müller
  • 5,291
  • 2
  • 29
  • 33