What is the standard way to maintain accuracy when dealing with incredibly precise floating point calculations in C++?

Question

I'm in the process of converting a program to C++ from Scilab (similar to Matlab) and I'm required to maintain the same level of precision that is kept by the previous code.

Note: Although maintaining the same level of precision would be ideal. It's acceptable if there is some error with the finished result. The problem I'm facing (as I'll show below) is due to looping, so the calculation error compounds rather quickly. But if the final result is only a thousandth or so off (e.g. 1/1000 vs 1/1001) it won't be a problem.

I've briefly looked into a number of different ways to do this including:

Int vs Float Example: Instead of using the float 12.45, store it as an integer being 124,500. Then simply convert everything back when appropriate to do so. Note: I'm not exactly sure how this will work with the code I'm working with (more detail below).

An example of how my program is producing incorrect results:

for (int i = 0; i <= 1000; i++)
{
    for (int j = 0; j <= 10000; j++)
    {
        // This calculation will be computed with less precision than in Scilab
        float1 = (1.0 / 100000.0);

        // The above error of float2 will become significant by the end of the loop
        float2 = (float1 + float2);
    }
}

My question is:

Is there a generally accepted way to go about retaining accuracy in floating point arithmetic OR will one of the above methods suffice?

You should edit your question to provide more self contained code samples rather than providingt only links. — πάντα ῥεῖ, Jun 06 '16 at 21:51
@πάνταῥεῖ Sorry, I'm unsure what you're asking for. What do you mean by self contained code samples? — Paul Warnick, Jun 06 '16 at 21:56
It's very annoying that I have to follow your links (and I didn't actually) first to get the essence of what you're asking about. Provide a [MCVE] in your question itself please. — πάντα ῥεῖ, Jun 06 '16 at 21:58
See here: http://stackoverflow.com/questions/2568446/the-best-cross-platform-portable-arbitrary-precision-math-library — davidhigh, Jun 06 '16 at 22:23
@πάνταῥεῖ: This question doesn't benefit from code samples. It is asking for very generic concepts, concepts that most developers are well aware of. What sort of [mcve] would you expect? What should it contain? How would it improve this question? — IInspectable, Jun 06 '16 at 22:39
For BigFloat type of things, please look at MPFR instead of GMP. It still uses GMP internally but provides many more operations, with guaranteed precision. — Marc Glisse, Jun 07 '16 at 05:50
wee need to know what kind of compuutation you will do (as the techniques to preserve accuracy are different) so wee need to know the number ranges you will compute on ... some limits ... desired accuracy value. And also which way you want to go use standard FPU variables or bignum or arbitrary precision numbers. Also other constrains matters a lot like desired speed etc. For example see this [Is it possible to make realistic n-body solar system simulation in matter of size and mass?](http://stackoverflow.com/a/28020934/2521214) especially the last edit ... — Spektre, Jun 07 '16 at 07:21
Some techniques are based on better rounding for example see [How to deal with overflow and underflow?](http://stackoverflow.com/a/33006665/2521214) some dynamically change the number bit-width to match expected result accuracy, And for the best or almost no precision loss (like in DERIVE) fraction number representation is used. — Spektre, Jun 07 '16 at 07:24
@Spektre I've edited my question to describe my specific issues with precision. As for desired speed, currently with using standard doubles the code builds and executes in less than a second. Anything close to that would be acceptable up to about a 5 second build time (and I'm assuming even slowest of precision fixing methods won't take that long). — Paul Warnick, Jun 07 '16 at 18:09
@PaulWarnick that example is simple integration have a look at the `n-body solar system...` link from my previous comment You will find simple technique how to deal with this without any libs or horrible amount of coding at the end of that answer. — Spektre, Jun 07 '16 at 19:45

score 3 · Accepted Answer · answered Jun 06 '16 at 22:30

Maintaining precision when porting code like this is very difficult to do. Not because the languages have implicitly different perspectives on what a float is, but because of what the different algorithms or assumptions of accuracy limits are. For example, when performing numerical integration in Scilab, it may use a Gaussian quadrature method. Whereas you might try using a trapezoidal method. The two may both be working on identical IEEE754 single-precision floating point numbers, but you will get different answers due to the convergence characteristics of the two algorithms. So how do you get around this?

Well, you can go through the Scilab source code and look at all of the algorithms it uses for each thing you need. You can then replicate these algorithms taking care of any pre- or post-conditioning of the data that Scilab implicitly does (if any at all). That's a lot of work. And, frankly, probably not the best way to spend your time. Rather, I would look into using the Interfacing with Other Languages section from the developer's documentation to see how you can call the Scilab functions directly from your C, C++, Java, or Fortran code.

Of course, with the second option, you have to consider how you are going to distribute your code (if you need to).Scilab has a GPL-compatible license, so you can just bundle it with your code. However, it is quite big (~180MB) and you may want to just bundle the pieces you need (e.g., you don't need the whole interpreter system). This is more work in a different way, but guarantees numerical-compatibility with your current Scilab solutions.

Thank you for your answer. Fortunately, I'm only dealing with standard arithmetic (e.g. + - * /) so I don't have to worry about converting any algorithms (assuming the above operators work the same in both languages). — Paul Warnick, Jun 07 '16 at 18:05
@PaulWarnick If you are only handling simple arithmetic, then why are you considering using an arbitrary precision library like GMP when Scilab doesn't use that? I'm confused about what your overall goal is. — Tim, Jun 08 '16 at 00:24
The more I looked into my issue the more I began to understand that I don't actually need GMP. From my question: http://stackoverflow.com/questions/37686796/what-is-the-precision-of-floating-point-calculations-in-scilab (particularly the accepted answer and following comments) I've started to figure how to fix the problem. My reasoning for asking this question originally was because I was under the impression that Scilab maintained a higher level of precision than a standard C++ double, whereas in reality they follow the same IEEE standards. — Paul Warnick, Jun 08 '16 at 18:11

score 1 · Answer 2 · answered Jun 06 '16 at 22:30

Is there a generally accepted way to go about retaining accuracy in floating point arithmetic

"Generally accepted" is too broad, so no.

will one of the above methods suffice?

Yes. Particularly gmp seems to be a standard choice. I would also have a look at the Boost Multiprecision library.

A hand-coded integer approach can work as well, but is surely not the method of choice: it requires much more coding, and more severe a means to store and process aritrarily precise integers.

score 0 · Answer 3 · answered Jun 06 '16 at 22:14

0

If your compiler supports it use BCD (Binary-coded decimal)

Sam

answered Jun 06 '16 at 22:14

Sam

2,473
3
18
29

score 0 · Answer 4 · answered Jun 06 '16 at 22:18

0

Well, another alternative if you use GCC compilers is to go with quadmath/__float128 types.

answered Jun 06 '16 at 22:18

Severin Pappadeux

18,636
3
38
64

What is the standard way to maintain accuracy when dealing with incredibly precise floating point calculations in C++?

4 Answers4

Linked