rounding error in C++ advice

Question

I am trying to model a dynamical system in C++ and have some issues with the precision of stored values.

I have read the question on the floating point precision, but am unable to figure out what is going wrong in my case.

So, the issue is about storing the particle velocity. For example, the particle starts with velocity 1.01 and because of the nature of dynamical system it can only gain or lose 0.4 units at each interaction.

The value of velocity is stored okay (correct to full precision) for the first 2890 interactions. At the 2891st interaction, the velocity becomes

4.209999999999999

instead of

4.21

Does this happen because at that point the velocity reaches an "ugly" value for the first time? After 2890 interactions though?

I have two possible fixes of this problem in mind and would like to ask you for advice. Please keep in mind that I need to vary initial velocity, not just use 1.01.

Change the units of the system so that it gains or loses 1.0 velocity at each interaction. (Will this even work? For example, if the starting velocity is 1.21, it would probably get the same error at the 4.21 value?)
Round the velocity to some fixed number of decimal places after each interaction. (I don't like this very much, because I would have to know the precision of the velocity in advance, which would make the program clumsier)

Thank you for possible thoughts on this.

These effects are well-known phenomena of "floating point arithmetic". Note that a double precision variable is correct to about the 15 significant figure: good enough for most physical applications. (E.g. gravitational constant is known to far less precision than this). Don't do any rounding or anything: just learn to live with it. — Bathsheba, Sep 10 '14 at 06:49
This is probably why I have never had to deal with this before, but as the system propagates the 15th significant digit causes trouble. I know its the digit, because if I manually change it back to 4.21, the reversibility test works fine (I am propagating the system forwards and then backwards in time, to see if the code works properly) — miha priimek, Sep 10 '14 at 06:52
@mihapriimek If the only "interactions" with the initial velocity variable were increments and decrements by a multiple of 0.4 (a value not precisely representable in base 2), then changing this to inc/dec by multiples of 1.0 (a value exactly representable in base 2) is liable to solve the problem. — Iwillnotexist Idonotexist, Sep 10 '14 at 06:55
@IwillnotexistIdonotexist I suspect the problem lies with the value of 4.21, as the previous increments of 0.4 (the first 2890) worked okay. Any thoughts on this? — miha priimek, Sep 10 '14 at 06:57
First question to ask is why this is a problem? In any real system your 0.4 and 1.01 values are approximations anyway so what application to you have that are you unable to accept a very accurate approximation to your result? If you need exact calculations on decimal values you are probably best looking for one of the many libraries that does decimal math and using that. — jcoder, Sep 10 '14 at 07:00
@jcoder So, the problem is that after a large number of interactions, I need to be sure that the effects I am seeing are the properties of the system and not numerical noise. To check this, I propagate the system forward in time for n interactions, then reverse the velocities, and propagate system an additional n interactions - If the code works, it should return to the starting point (it is not chaotic). Now, if I propagate it for 2890 interactions and back, it works fine. If I add 1 more interactions, the rounding error occurs and system returns to a state very different from initial one. — miha priimek, Sep 10 '14 at 07:05
@jcoder How slow will the operations become if I use a higher precision library? — miha priimek, Sep 10 '14 at 07:10
@mihapriimek It's entirely possible that at the 2890th iteration you've accumulated so much error you're now committing to a wrong value. Consider that `1.01 + 0.4-0.4 + 0.4-0.4 + 0.4-0.4 + 0.4-0.4 ...` leads to a steadily decreasing value when I examine the result in GDB... — Iwillnotexist Idonotexist, Sep 10 '14 at 07:11
Is it practical for you to use a signed integer number, and instead of incrementing by 0.4 the velocity you increment the integer number, and compute _from_ it the velocity as `1.01 + i*0.4`? This limits accumulated rounding error and ensures you'll always get the same result, whichever sequence of adds and subtracts of 0.4 you use to get to any particular position. — Iwillnotexist Idonotexist, Sep 10 '14 at 07:15
@IwillnotexistIdonotexist Wow, this is a brilliant idea! I was hoping for something like this - I'll try to test it and see if it works. — miha priimek, Sep 10 '14 at 07:20
@mihapriimek When you try this, _don't_ compile with `-Ofast` or equivalent, since this numerical rewrite relies on the compiler not optimizing `i++;v = 1.01 + i*0.4` down to `v += 0.4`. This is called strength reduction, and for float values it's _evil_ to do it, as you just discovered. — Iwillnotexist Idonotexist, Sep 10 '14 at 07:24
@IwillnotexistIdonotexist Ok, so if I just compile normally without any options this should work, right? — miha priimek, Sep 10 '14 at 07:27
I don't know your compiler; I'm just saying that you should explicitly disable the "unsafe" floating-point optimizations. On GCC they are enabled with `-Ofast`, on ICC I think they're even enabled by default; And I don't know for MSVC. If "without any options" == unoptimized on your compiler then yes, it should work. — Iwillnotexist Idonotexist, Sep 10 '14 at 07:28
The switch is `/fp:precise` on MSVC (it is the default). You might prefer `/fp:strict` in this case. [Documentation is here](http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx). — Cody Gray - on strike, Sep 10 '14 at 08:18

score 1 · Answer 1 · answered Sep 10 '14 at 08:51

Iwillnotexist Idonotexist has already touched on the root cause in the comments. IEEE FP numbers are binary fractions and 4/10 cannot be perfectly expressed as a binary fraction in just the same way as, for example, 1/3 cannot be perfectly represented as a decimal fraction.

If you must stick with the FP representation and you are open to changing the values you use and you must have perfect accuracy then stick to binary fractions which are numbers that can be represented as the sum of any of 1/2, 1/4, 1/8, 1/16 etc...

score -1 · Answer 2 · answered Sep 10 '14 at 06:55

-1

Hi not sure if this would work for you,

I have a suggestion for you. Since you are looking for precision you could switch to another type and use it instead of double. What I would do in your case would be to create a data structure that represents the double value in a different way so that it can preserve the precision.

answered Sep 10 '14 at 06:55

user3928968

21
5

What would this other type look like? How would it store the value so that it preserves the precision? – Cody Gray - on strike Sep 10 '14 at 06:59
1

Kiss goodbye to performance at the same time. Floating point operations are *quick*. – Bathsheba Sep 10 '14 at 07:00
@CodyGray it can in example operate on string or in fraction made of two ints. – GingerPlusPlus Sep 10 '14 at 07:15
@CodyGray this is a very interesting matter. I was thinking about creating an array for the number arr[i] representing the i-th digit and keeping an integer the place where the decimal point should be. And of course this would minimize performance, but I just thought that for really big numbers this could be a solution. Tell me what is wrong. :) – user3928968 Sep 10 '14 at 07:41
Yes, I think it is interesting, too. The interesting details should be in your answer. – Cody Gray - on strike Sep 10 '14 at 08:19

rounding error in C++ advice

2 Answers2