Safe arithmetic with numbers smaller than machine precision?

Question

This appears to be a very basic question, yet I'm still not sure I understand correctly.

Say I have defined some very small and large numbers

    constexpr double a = 1.53636e-34;
    constexpr double b = 6.12362e-36;
    constexpr double c = 6.92956e+19;

and want to use them for some arithmetic. Is it safe to do so in double precision where only 16 digits are significant?

EDIT: Let's use an example. Say we want to obtain the speed of light in atomic units. It's defined as:

    double c = 2 * epsi * h * col / (e * e);

where

    double e = 1.602176634e−19;
    double h = 6.62607015e−34;
    double col = 299792458;
    double epsi = 8.8541878128e−12;

We obviously don't care about everything that happens after the ninth decimal place or so. What we do care about though is that the above consistently evaluates to 137.035999....

EDIT2: Formula was wrong.

Depends on what you mean by safe and what you want to do. Floating point arithmetic is always susceptible to rounding errors, more so near it's limits. — François Andrieux, Sep 05 '19 at 17:02
Also depends on what arithmetic you propose to do. Multiplication and division are have different (fewer?) failure modes than addition and subtraction. — dmckee --- ex-moderator kitten, Sep 05 '19 at 17:04
This might be relevant if it turns out the answer is that it's not safe: https://en.wikipedia.org/wiki/List_of_arbitrary-precision_arithmetic_software — bob, Sep 05 '19 at 17:05
Precision (how *long* the number is) has little to do with range (how *small* or *large* the number is). — n. m. could be an AI, Sep 05 '19 at 17:05
More specifically, I'm working on a quantum chemistry program which thus needs to use natural constants. What I mean with safe is that it is obvious that we want to obtain reliable, consistent and, of course, numerically correct results for a particular input. — EigenGrau, Sep 05 '19 at 17:05
The numbers you show have roughly 7 decimal digits of precision. — Maxim Egorushkin, Sep 05 '19 at 17:08
If you want to add information to your question, please [edit] it. — n. m. could be an AI, Sep 05 '19 at 17:10
@EigenGrau As far as reliability, consistency and correctness go it's important to understand that `double`s don't always provide exact mathematically correct results. Notably, most commutative mathematical operations are not commutative with `double`s. Some mathematical identities don't quite hold with `double`s. See [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken). `double`s are only a good approximation of real math. — François Andrieux, Sep 05 '19 at 17:11
Worth a read: [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). Also: [Sometimes Floating Point Math is Perfect](https://randomascii.wordpress.com/2017/06/19/sometimes-floating-point-math-is-perfect/). — Jesper Juhl, Sep 05 '19 at 17:13
As most comments point out, you seem to be confusing *precision* with *range*. The fact that the numbers are tiny, is a *range* issue. The *precision* is about the quantity of digits. Of course, if you do `a+c` it will probably give you `== c`. You can always use `long double`, anyway. — Mirko, Sep 05 '19 at 17:14
The vast majority of precision problem occur when adding or subtracting values of different magnitudes. The question talks about multiplication and division which are safe as long as the range is ok. OP what additions/subtractions are you going to do? Will the magnitudes roughly match? — Jeffrey, Sep 05 '19 at 17:40
If you are in doubts, you might want to employ some **multiprecision library**, such as [Boost.Multiprecision](https://www.boost.org/doc/libs/release/libs/multiprecision/) and compare results. Boost.Multipresions can also wrap other libraries (such as GMP) and (at least partially) works with functions from Boost.Math. — Daniel Langr, Sep 05 '19 at 17:48
In your example, double precision seems to be sufficient: https://wandbox.org/permlink/HppugNq1ri6pExlR (note that there is also quad precision from Boost used). What's wrong with the result 137.036? — Daniel Langr, Sep 05 '19 at 18:08

Max Langhof · Accepted Answer · 2019-09-09T07:45:45.733

1

The relative precision of IEEE-754 floating point numbers using 64 bits (commonly double in C++) is constant* for values with magnitudes between about 10^-308 and 10³⁰⁸.

Within this range, you can expect about 15-16 decimal significant digits after the decimal point when you write the numbers in normalized scientific notation.

^{*Well, stays within a narrow margin:

https://en.wikipedia.org/wiki/IEEE_754-1985#/media/File:IEEE_754_relative_precision.svg}

edited Sep 09 '19 at 07:45

answered Sep 05 '19 at 17:10

Max Langhof

23,383
5
39
72

I'm curious what the downvote is for. I can't spot any errors in this answer, so it would be nice if someone could point it out. – François Andrieux Sep 05 '19 at 17:13
The precision is measured in significant digits, not after the decimal point. I.e. the first non-0 digit from the left is the most significant. – Maxim Egorushkin Sep 05 '19 at 17:13
1

@MaximEgorushkin In [scientific notation](https://en.wikipedia.org/wiki/Scientific_notation), those two descriptions are equivalent (plus/minus one). You might also object to the wording "decimal digits of precision", which I'm about to change. – Max Langhof Sep 05 '19 at 17:15
@MaxLanghof `0.003e3` is also scientific notation. You may like to be more specific. – Maxim Egorushkin Sep 05 '19 at 17:21
@MaximEgorushkin Added "normalized" to be sufficiently specific. Thanks for helping improve this answer! – Max Langhof Sep 09 '19 at 07:45

score 1 · Answer 2 · answered Sep 05 '19 at 17:34

The function that you are using (e.g., sqrt() or tan(), for instance) is responsible for the extent of precision that is required: Therefore, according to what is taught in Numerical Analysis, you must determine the expectable error that occurs in such a function call. In general, you can only ensure slightly less than the minimal precision guarantee that can be assumed from concatenation operation upon function calls.

NOTE: tag [numerical-analysis]

score 1 · Answer 3 · answered Sep 05 '19 at 17:35

How safe it is depends on how you define "safe", just keep in mind that floating point math in C++ is just an approximation. You mentioned you want to do physics and related calculations, and since everything is an approximation in physics, I don't see it going wrong. In fact, double is the best tool for your case. It is storage efficient, fast and reasonably precise. The thing is how large or small the number is has little to do with the precision.

Safe arithmetic with numbers smaller than machine precision?

3 Answers3