21

I'm experimenting with the precision of a double value in various programming languages.

My programs

main.c

#include <stdio.h>

int main() {
    for (double i = 0.0; i < 3; i = i + 0.1) {
        printf("%.17lf\n", i);
    }
    return 0;
}

main.cpp

#include <iostream>

using namespace std;

int main() {
    cout.precision(17);
    for (double i = 0.0; i < 3; i = i + 0.1) {
        cout << fixed << i << endl;
    }
    return 0;
}

main.py

i = 0.0
while i < 3:
    print(i)
    i = i + 0.1

Main.java

public class Main {
    public static void main(String[] args) {
        for (double i = 0.0; i < 3; i = i + 0.1) {
            System.out.println(i);
        }
    }
}

The output

main.c

0.00000000000000000
0.10000000000000001
0.20000000000000001
0.30000000000000004
0.40000000000000002
0.50000000000000000
0.59999999999999998
0.69999999999999996
0.79999999999999993
0.89999999999999991
0.99999999999999989
1.09999999999999990
1.20000000000000000
1.30000000000000000
1.40000000000000010
1.50000000000000020
1.60000000000000030
1.70000000000000040
1.80000000000000050
1.90000000000000060
2.00000000000000040
2.10000000000000050
2.20000000000000060
2.30000000000000070
2.40000000000000080
2.50000000000000090
2.60000000000000100
2.70000000000000110
2.80000000000000120
2.90000000000000120

main.cpp

0.00000000000000000
0.10000000000000001
0.20000000000000001
0.30000000000000004
0.40000000000000002
0.50000000000000000
0.59999999999999998
0.69999999999999996
0.79999999999999993
0.89999999999999991
0.99999999999999989
1.09999999999999987
1.19999999999999996
1.30000000000000004
1.40000000000000013
1.50000000000000022
1.60000000000000031
1.70000000000000040
1.80000000000000049
1.90000000000000058
2.00000000000000044
2.10000000000000053
2.20000000000000062
2.30000000000000071
2.40000000000000080
2.50000000000000089
2.60000000000000098
2.70000000000000107
2.80000000000000115
2.90000000000000124

main.py

0.0
0.1
0.2
0.30000000000000004
0.4
0.5
0.6
0.7
0.7999999999999999
0.8999999999999999
0.9999999999999999
1.0999999999999999
1.2
1.3
1.4000000000000001
1.5000000000000002
1.6000000000000003
1.7000000000000004
1.8000000000000005
1.9000000000000006
2.0000000000000004
2.1000000000000005
2.2000000000000006
2.3000000000000007
2.400000000000001
2.500000000000001
2.600000000000001
2.700000000000001
2.800000000000001
2.9000000000000012

Main.java

0.0
0.1
0.2
0.30000000000000004
0.4
0.5
0.6
0.7
0.7999999999999999
0.8999999999999999
0.9999999999999999
1.0999999999999999
1.2
1.3
1.4000000000000001
1.5000000000000002
1.6000000000000003
1.7000000000000004
1.8000000000000005
1.9000000000000006
2.0000000000000004
2.1000000000000005
2.2000000000000006
2.3000000000000007
2.400000000000001
2.500000000000001
2.600000000000001
2.700000000000001
2.800000000000001
2.9000000000000012

My question

I know that there are some errors in the double type itself about which we can learn more from blogs like Why You Should Never Use Float and Double for Monetary Calculations and What Every Computer Scientist Should Know About Floating-Point Arithmetic.

But these errors are not random! Every time the errors are the same, thus my question is why are these different for different programming languages?

Secondly, why are the precision errors in Java and Python same? [Java's JVM is written in C++ whereas the python interpreter is written in C]

But surprisingly their errors are same, but different from the errors in C and C++. Why is this happening?

Jaysmito Mukherjee
  • 1,467
  • 2
  • 10
  • 29
  • 2
    You should make sure to print the same number of digits for a fair comparison. `0.10000000000000001` and `0.100000` can both represent the same value if the second case was printed with fewer digits. – François Andrieux Jan 15 '21 at 18:58
  • 1
    There are not 'errors.' Binary representation of decimal numbers calls for approximation. It's all deterministic and understood. – nicomp Jan 15 '21 at 19:00
  • @FrançoisAndrieux for java and python thats the max for cpp too i tried to do similar but i am fairly new with c and have no idea how to print it more precisely – Jaysmito Mukherjee Jan 15 '21 at 19:00
  • @nicomp my question is not of the approximation but why it is diff in diff languages and why similar in some languages – Jaysmito Mukherjee Jan 15 '21 at 19:01
  • @JaysmitoMukherjee It's [certainly not the most digits you can print in C++](https://godbolt.org/z/4TbTfn). – François Andrieux Jan 15 '21 at 19:02
  • 1
    *"Why is this happening?"* - TBH, a very good answer is "why not?" – klutt Jan 15 '21 at 19:03
  • @JaysmitoMukherjee The question *is* about approximation because `0.1` cannot be represented by a binary fraction so it must be approximated. This is similar to trying to represent 1/3 in decimal. It will be approximated. – François Andrieux Jan 15 '21 at 19:04
  • @FrançoisAndrieux it is not needed to print more as no other languages print more precision than that we can find the diff with that many digits itself my problem is with c where the digits are less – Jaysmito Mukherjee Jan 15 '21 at 19:04
  • Very close to a duplicate https://stackoverflow.com/questions/588004/is-floating-point-math-broken – Richard Critten Jan 15 '21 at 19:04
  • @FrançoisAndrieux my question is not about approximation but why it is diff in some languages and why similar in some languages – Jaysmito Mukherjee Jan 15 '21 at 19:06
  • I don't see why people are voting to close this. It's an interesting question and it's pretty clear. – klutt Jan 15 '21 at 19:06
  • @RichardCritten but i am not asking about the reason of this problem i am asking about the different ways of it in diff languages – Jaysmito Mukherjee Jan 15 '21 at 19:07
  • @JaysmitoMukherjee The question is this: "Why do you expect these approximations to behave different or the same?" – klutt Jan 15 '21 at 19:07
  • @JaysmitoMukherjee Different languages can choose how to round when displaying the value. You have a number which is hard to display in decimal. There should also be no expectation that every language uses the same floating point implementation. Even in C++ the floating point implementations used is not specified. – François Andrieux Jan 15 '21 at 19:08
  • @klutt i do not expect them to be same or diff but why they are so in these languages while their parent languages are different – Jaysmito Mukherjee Jan 15 '21 at 19:09
  • 4
    Surely these differences are due to the different ways these languages are printing the values not because there are holding different values, or am I missing something? – john Jan 15 '21 at 19:09
  • @john i dont know about c but most probably the others are not modifying the values while printing – Jaysmito Mukherjee Jan 15 '21 at 19:10
  • You have fewer digits in C because that's what the standard library defaults to. Change your code to `printf("%.17lf\n", i);` and the output looks like the C++ example. Are you asking why the defaults aren't the same? – Blastfurnace Jan 15 '21 at 19:11
  • @Blastfurnace thanks i will do it now – Jaysmito Mukherjee Jan 15 '21 at 19:11
  • @JaysmitoMukherjee No of course they are not modifying the values, but that have choice in how they print values (number of decimal places being only the most obvious choice). – john Jan 15 '21 at 19:11
  • @JaysmitoMukherjee There are also choices in the algorithm used to convert the internal binary representation to decimal. Not all languages will be using the same algorithm. And not all languages insist on the most precise conversion possible (C and C++ do not). – john Jan 15 '21 at 19:12
  • @john about number of decimal places i am sure that now all are printing same number of decimal places! – Jaysmito Mukherjee Jan 15 '21 at 19:13
  • @john So why does Java and python have same result when they are based of C++ and C respectively – Jaysmito Mukherjee Jan 15 '21 at 19:15
  • @JaysmitoMukherjee I can think of a couple of possibilities, different rounding modes being used (I be surprised at that however), or different algorithms being used to convert binary to decimal (that seems more likely). What you need to do is look at the bit pattern of the double value, instead of relying on printing the values. – john Jan 15 '21 at 19:17
  • Just because a language is implemented in C or C++ doesn't mean it must have identical behavior to C or C++. That's a decision for the language designers and implementers. – Blastfurnace Jan 15 '21 at 19:18
  • @Blastfurnace i too think so but if so the is the entire way of representing data remade in java and python and coincidentally they are the same – Jaysmito Mukherjee Jan 15 '21 at 19:21
  • 4
    @JaysmitoMukherjee You keep assuming that the differences in printed values imply a difference in underlying representation, but that is not the case. As I said earlier to prove your thesis you need to look at the binary representation of the double not at it's printed decimal representation. – john Jan 15 '21 at 19:22
  • As an aside (and in case you don't already know this), to prevent the errors from accumulating, if you need floating-point values increasing/decreasing by a fixed amount `x` (0.1 in this case) each time you should calculate each as `n*x` rather than adding `x` each time. Better still would be to see that you can divide (using floating-point division) `n` by (1/`x`) - in this case 10. – Wai Ha Lee Jan 16 '21 at 10:56
  • 1
    If one changes main.py to use the same format as the C and C++ programs and print 17 digits after the decimal point, with `print(f'{i:.17f}')`, it (CPython 3.10.0a4 on Windows), *exactly* matches the main.cpp output, ending with `2.90000000000000124`. So there is no difference between the two to be explained. – Terry Jan Reedy Jan 17 '21 at 04:13
  • @AsteroidsWithWings same for each programming language please read the full sentence – Jaysmito Mukherjee Jan 18 '21 at 04:46

4 Answers4

18

The differences in output are due to differences in converting the floating-point number to a numeral. (By numeral, I mean a character string or other text that represents a number. “20”, “20.0”, “2e+1”, and “2•102” are different numerals for the same number.)

For reference, I show the exact values of i in notes below.

In C, the %.17lf conversion specification you use requested 17 digits after the decimal point, so 17 digits after the decimal point are produced. However, the C standard allows some slack in this. It only requires calculation of enough digits that the actual internal value can be distinguished.1 The rest can be filled in with zeros (or other “incorrect” digits). It appears the C standard library you are using only fully calculates 17 significant digits and fills the rest you request with zeros. This explains why you got “2.90000000000000120” instead of “2.90000000000000124”. (Note that “2.90000000000000120” has 18 digits: 1 before the decimal point, 16 significant digits after it, and 1 non-significant “0”. “0.10000000000000001” has an aesthetic “0” before the decimal point and 17 significant digits after it. The requirement for 17 significant digits is why ““0.10000000000000001” must have the “1” at the end but “2.90000000000000120” may have a “0”.)

In contrast, it appears your C++ standard library does the full calculations, or at least more (which may be due to a rule in the C++ standard2), so you get “2.90000000000000124”.

Python 3.1 added an algorithm to convert with the same result as Java (see below). Prior to that was lax about the conversion for display. (To my knowledge, it is still lax about the floating-point format used and conformance to IEEE-754 in arithmetic operations; specific Python implementations may differ in behavior.)

Java requires that the default conversion from double to string produce just as many digits as are required to distinguish the number from neighboring double values (also here). So it produces “.2” instead of “0.20000000000000001” because the the double nearest .2 is the value that i had in that iteration. In contrast, in the next iteration, the rounding errors in arithmetic gave i a value slightly different from the double nearest .3, so Java produced “0.30000000000000004” for it. In the next iteration, the new rounding error happened to partially cancel the accumulated error, so it was back to “0.4”.

Notes

The exact values of i when IEEE-754 binary64 is used are:

0
0.1000000000000000055511151231257827021181583404541015625
0.200000000000000011102230246251565404236316680908203125
0.3000000000000000444089209850062616169452667236328125
0.40000000000000002220446049250313080847263336181640625
0.5
0.59999999999999997779553950749686919152736663818359375
0.6999999999999999555910790149937383830547332763671875
0.79999999999999993338661852249060757458209991455078125
0.899999999999999911182158029987476766109466552734375
0.99999999999999988897769753748434595763683319091796875
1.0999999999999998667732370449812151491641998291015625
1.1999999999999999555910790149937383830547332763671875
1.3000000000000000444089209850062616169452667236328125
1.4000000000000001332267629550187848508358001708984375
1.5000000000000002220446049250313080847263336181640625
1.6000000000000003108624468950438313186168670654296875
1.7000000000000003996802888650563545525074005126953125
1.8000000000000004884981308350688777863979339599609375
1.9000000000000005773159728050814010202884674072265625
2.000000000000000444089209850062616169452667236328125
2.10000000000000053290705182007513940334320068359375
2.200000000000000621724893790087662637233734130859375
2.300000000000000710542735760100185871124267578125
2.400000000000000799360577730112709105014801025390625
2.50000000000000088817841970012523233890533447265625
2.600000000000000976996261670137755572795867919921875
2.7000000000000010658141036401502788066864013671875
2.800000000000001154631945610162802040576934814453125
2.90000000000000124344978758017532527446746826171875

These are not all the same values you would get by converting 0, .1, .2, .3,… 2.9 from decimal to binary64 because they are produced by arithmetic, so there are multiple rounding errors from the initial conversions and the consecutive additions.

Footnotes

1 C 2018 7.21.6.1 only requires that the resulting numeral be accurate to DECIMAL_DIG digits in a specified sense. DECIMAL_DIG is the number of digits such that, for any number in any floating-point format in the implementation, converting it to a decimal number with DECIMAL_DIG significant digits and then back to floating-point yields the original value. If IEEE-754 binary64 is the most precise format your implementation supports, then its DECIMAL_DIG is at least 17.

2 I do not see such a rule in the C++ standard, other than incorporation of the C standard, so it may be that your C++ library is simply using a different method from your C library as a matter of choice.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • “20”, “20.0” are not the same number. – nicomp Jan 16 '21 at 13:34
  • 1
    @nicomp: Of course they are not the same number, because “20” and “20.0” are character strings, not numbers. They are numerals that represent the same number, twenty. Or you may be thinking of `20` and `20.0` as text in source code that programming languages attribute types to. In that case, they represent conceptual elements in a programming language with different types, but those elements still represent the same number, twenty. – Eric Postpischil Jan 16 '21 at 13:59
  • They don't represent the same number. 20 is an integer and 20.0 is a floating point number with one digit place of precision after the radix point. – nicomp Jan 16 '21 at 14:05
  • @nicomp: In the context I used in my answer, “number” refers to a mathematical number, such as the real numbers. Twenty is twenty regardless of whether it is expressed as 20 or 20.0. The IEEE-754 standard tells us the number represented with digits 2 and 0, a radix of 10, and an exponent of 1 (relative to a decimal point set after the first digit) is twenty. Calling something a “floating-point number” is a shorthand for its representation using a sign, sequence of digits, and an exponent. “20” and “20.0” are just notations; they are not the entities I use the word “number” to refer to. – Eric Postpischil Jan 16 '21 at 14:10
  • @nicomp: By the way, in case you think `20.0` and `2e+1` in source code produce the same number, see C 2018 footnote 77: “`1.23`, `1.230`, `123e-2`, `123e-02`, and `1.23L` are all different source forms and thus need not convert to the same internal format and value.” – Eric Postpischil Jan 16 '21 at 14:32
  • 1
    I believe that the CPython rule for float representation is 'nicest' (fewest digits) that 'round-trip', meaning `float(str(x)) == x`, and that this is equivalent to Java rule quoted by Eric. – Terry Jan Reedy Jan 17 '21 at 04:29
  • 1
    FWIW it was Python 3.1 that changed the floating point representation algorithm, this is documented in https://docs.python.org/3/whatsnew/3.1.html#other-language-changes – Marius Gedminas Jan 18 '21 at 16:12
11

The differences you're seeing are in how you print out the data, not in the data itself.

As I see it, we have two problems here. One is that you're not consistently specifying the same precision when you print out the data in each language.

The second is that you're printing the data out to 17 digits of precision, but at least as normally implemented (double being a 64-bit number with a 53-bit significand) a double really only has about 15 decimal digits of precision.

So, while (for example) C and C++ both require that your result be rounded "correctly", once you go beyond the limits of precision it's supposed to support, they can't guarantee much about producing truly identical results in every possible case.

But that's going to affect only how the result looks when you print it out, not how it's actually stored internally.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
4

I don't know about Python or Java but neither C and C++ insist that the printed decimal representation of a double value be as precise or concise as possible. So comparing printed decimal representations does not tell you everything about the actual value that is being printed. Two values could be the same in the binary representation but still legitimately print as different decimal strings in different languages (or different implementations of the same language).

Therefore your lists of printed values are not telling you that anything unusual is going on.

What you should do instead is print the exact binary representations of your double values.

Some useful reading. https://www.exploringbinary.com/

john
  • 85,011
  • 4
  • 57
  • 81
2

But these errors are not random!

Correct. That should be expected.

why are these different for different programming language?

Because you've formatted the output differently.

Why are the errors in Java and Python same?

They seem to have the same, or sufficiently similar default formatting.

eerorika
  • 232,697
  • 12
  • 197
  • 326