What does "double + 1e-6" mean?

Question

The result of this cpp is 72.740, but the answer should be like 72.741

mx = 72.74050000;
printf("%.3lf \n", mx);

So I found the solution on website, and it told me to add "+1e-7" and it works

mx = 72.74050000;
printf("%.3lf \n", mx + 1e-7);

but I dont know the reason in this method, can anyone explain how it works?

And I also try to print it but nothing special happens..., and it turn out to be 72.7405

mx = 72.74050003;
cout << mx + 1e-10;

Yes, I know that. The main idea I wanted to ask was why I added "1e+7" and the notation will be correct. — Ank H, Aug 13 '22 at 10:25
Rounding errors of float. 3/10 in decimal 0.333.... needs an infinite number of 3's. There is the same with floating point number in binary and there is only so many bits. So any floating point number is an approximation (https://www.learncpp.com/cpp-tutorial/floating-point-numbers/). Note that using printf is a bit archaic by now : see format (https://en.cppreference.com/w/cpp/utility/format) — Pepijn Kramer, Aug 13 '22 at 10:29
@Pepijn Krame Thanks a lot, but I think I can't remember so many including library ,I am accustomed to using (c)printf in convenient, or (iomanip) fixed << setpresion(5), haha — Ank H, Aug 13 '22 at 10:52
Don't confuse "not yet know" with "advanced". There is a whole world for you to explore, and most "standard" problems already have an implementation for you : like for strings use [``](https://www.learncpp.com/cpp-tutorial/introduction-to-stdstring/), for arrays that can change size at runtime there is [std::vector](https://www.learncpp.com/cpp-tutorial/an-introduction-to-stdvector/) and many more [containers](https://en.cppreference.com/w/cpp/container) and algorithms (sort, find etc) in [``](https://en.cppreference.com/w/cpp/algorithm). Including stuff isn't bad at all :) — Pepijn Kramer, Aug 13 '22 at 11:11
Re: "the answer should be ..." -- floating-point numbers are not the kind of numbers we all grew up dealing with, so be very careful about making assumptions about how they work based on your experience in the real world. 72.74050000 does not have an exact representation as a floating-point value, so your intuition will lead you astray. See [is floating-point math broken](https://stackoverflow.com/questions/588004/is-floating-point-math-broken). — Pete Becker, Aug 13 '22 at 12:29

Netch · Accepted Answer · 2022-08-14T19:08:40.433

To start, your question contains an incorrect assumption. You put 72.7405 (let's assume it's precise) on input and expect 72.741 on output. So, you assume that rounding in printf will select higher candidate of possible twos. Why?

Well, one could consider this is your task, according to some rules (e.g. fiscal norms for rounding in bills, in taxation, etc.) - this is usual. But, when you use standard de facto floating of C/C++ on x86, ARM, etc., you should take the following specifics into account:

It is binary, not decimal. As result, all values you showed in your example are kept with some error.
Standard library tends to use standard rounding, unless forced to use another method.

The second point means that default rounding in C floating is round-to-nearest-ties-to-even (or, shortly, half-to-even). With this rounding, 72.7405 will be rounded to 72.740, not 72.741 (but, 72.7415 will be rounded to 72.742). To ask for rounding 72.7405 -> 72.741, you should have installed another rounding mode: round-to-nearest-ties-away-from-zero (shortly: round-half-away). This mode is request, to refer to, in IEEE754 for decimal arithmetic. So, if you used true decimal arithmetic, it would suffice.

(If we don't allow negative numbers, the same mode might be treated as half-up. But I assume negative numbers are not permitted in financial accounting and similar contexts.)

But, the first point here is more important: inexactness of representation of such values can be multiplied by operations. I repeat your situation and a proposed solution with more cases:

Code:

#include <stdio.h>
int main()
{
  float mx;
  mx = 72.74050000;
  printf("%.6lf\n", mx);
  printf("%.3lf\n", mx + 1e-7);
  mx *= 3;
  printf("%.6lf\n", mx);
  printf("%.3lf\n", mx + 1e-7);
}

Result (Ubuntu 20.04/x86-64):

So you see that just multiplying of your example number by 3 resulted in situation that the compensation summand 1e-7 gets not enough to force rounding half-up, and 218.2215 (the "exact" 72.7405*3) is rounded to 218.221 instead of desired 218.222. Oops, "Directed by Robert B. Weide"...

How the situation could be fixed? Well, you could start with a stronger rough approach. If you need rounding to 3 decimal digits, but inputs look like having 4 digits, add 0.00005 (half of least significant digit in your results) instead of this powerless and sluggish 1e-7. This will definitely move half-voting values up.

But, all this will work only if result before rounding have error strictly less than 0.00005. If you have cumbersome calculations (e.g. summing hundreds of values), it's easy to get resulting error more than this threshold. To avoid such an error, you would round intermediate results often (ideally, each value).

And, the last conclusion leads us to the final question: if we need to round each intermediate result, why not just migrate to calculations in integers? You have to keep intermediate results up to 4 decimal digits? Scale by 10000 and do all calculations in integers. This will also aid in avoiding silent(*) accuracy loss with higher exponents.

(*) Well, IEEE754 requires raising "inexact" flag, but, with binary floating, nearly any operation with decimal fractions will raise it, so, useful signal will drown in sea of noise.

The final conclusion is the proper answer not to your question but to upper task: use fixed-point approaches. The approach with this +1e-7, as I showed above, is too easy to fail. No, don't use it, no, never. There are lots of proper libraries for fixed-point arithmetic, just pick one and use.

(It's also interesting why %.6f resulted in printing 72.740501 but 218.221497/3 == 72.740499. It suggests "single" floating (float in C) gets too inaccurate here. Even without this wrong approach, using double will postpone the issue, masking it and disguising as a correct way.)

Re “It is binary, not decimal”: The C and C++ standards do not require that two be used for the base of the floating-point format. It may be two, ten, or other values. — Eric Postpischil, Aug 14 '22 at 19:01
Re “It is binary, not decimal. As result, all values are kept with some error.”: The base is not the cause of error in the floating-point format. All fixed-precision formats of **any** base or **any** type (integer, fixed, floating, rational, or other) can represent only a finite number of numbers and therefore arithmetic with them must only approximate real-number arithmetic. — Eric Postpischil, Aug 14 '22 at 19:02
@EricPostpischil you're right, I'll reformulate: standard de facto on x86, and values used in examples. — Netch, Aug 14 '22 at 19:04
@Netch, so if I have100 double numbers, and I want to sum it. The result I wish it can round to the third decimal place I need to multiply by 10000 on these 100 numbers adding them to sum and code int tp = fmod(sum, 10);if(tp > 5) , mx += 10-tp; , at last , sum /= 10000, and the sum will be the finall answer, am I correct? — Ank H, Aug 15 '22 at 03:23
@AnkH If I got it right, well, this is the variant: you multiply all values by 10000, round to nearest integer, so, you have array of integers. Sum all them, and you have correct result in 1/10000. Then, do the required rounding, but you don't need fmod(), they are integers. Just check `sum%10`: if it >=5, you should round it up (trivially: `sum/10+1` or `(sum+5)/10`. Else, round down (`sum/10`). Integer division gives truncated quotient, not a floating number, so it suffices here without extra rounding. — Netch, Aug 15 '22 at 05:29
@AnkH And, finally, if you need floating number, divide result by 10000.0 (not by 10000, because it will fall into integer division). In your example you checked `if (tp>5)` - as far as I understand, it shall be `>=5`, if your are to use half-away. — Netch, Aug 15 '22 at 05:31
@Netch Thanks a lot, I got it. Is there the same in python or other language? — Ank H, Aug 15 '22 at 09:32
@AnkH In Python3, `/` is always float division and `//` is always integer division. In Python2, `/` depended on argument type the same as in C, and `//` (>=2.6) was always integer division. Its integer division is F-division, not T-division like in C, so, for negative dividend results will differ. Also it has `decimal` package for true decimal arithmetic; it's much slower but allows easy handling of decimal calculation. — Netch, Aug 15 '22 at 10:37

Vlad from Moscow · Answer 2 · 2022-08-13T10:28:49.143

-2

If you will output the value like

printf( "mx = %.16f\n", mx );

you will see

mx = 72.7404999999999973

So to make the result like 72.741 due to rounding in outputting with a call of printf you need to make the next digit equal to 5 instead of 4. It is enough to add 0.00001.

Here is a demonstration program.

#include <iostream>
#include <iomanip>
#include <cstdio>

int main( void ) 
{
    double mx = 72.74050000;

    printf( "mx = %.3f\n", mx + 0.00001);

    std::cout << "mx = " << std::setprecision( 5 ) <<  mx + 0.00001 << '\n';
}

The program output is

mx = 72.741
mx = 72.741

0.00001 is the same as 1e-5.

edited Aug 13 '22 at 10:28

answered Aug 13 '22 at 10:22

Vlad from Moscow

301,070
26
186
335

1

This is not the right answer. Netch's answer explains it: the default rounding mode almost everywhere these days is "round to even", so 72.7405 is rounded to 72.74. (But I definitely don't agree with Netch's suggestion to use fixed-point arithmetic!) – TonyK Aug 13 '22 at 23:07

What does "double + 1e-6" mean?

2 Answers2