accurate way of getting the modulo of a long and a double in c++

Question

I'm working on a c++ program that involves timing and in this program I have to determine the modulo of a time in miliseconds (a long) compared to a double. Normally one would cast the long as a double and use fmod or cast the double to and int/long and then use %. However this would not give me the accuracy I need either. Is there a way to easily handle this?

so what I'm looking for is this:

long a = 9999999; 
double b= 1.42 ;
double answer = a % b;  <<< how do I do this?

A little bit of code would make the question much clearer. Supposing you declared your variables `long n` and `double x`, are you trying to compute `x%n`? (I don't suppose you want `n%x`, but some clarity on that point would be reassuring.) How does `fmod` fail to give you the accuracy you need? — David K, Dec 02 '14 at 13:20
If problem is in double comparing you can find your answer here: http://stackoverflow.com/questions/17333/most-effective-way-for-float-and-double-comparison — Neska, Dec 02 '14 at 13:25
Given the code you shown, what value would you expect, and what do you actually get? And you are aware of that the modulo operator is an integer operation, so the the value `1.42` will be truncated to `1`. — Some programmer dude, Dec 02 '14 at 13:28
I would expect to receive the value 1.16 and I get a compilation error on this — Thijser, Dec 02 '14 at 13:30
Why do you think that casting long to double and using `fmod` wouldn't give you sufficient accuracy? — eerorika, Dec 02 '14 at 13:32
Because we take a 64 bits integer and cast it to a double which has an accuracy of 54 bits which is less, therefore there will be a loss of accuracy in the casting — Thijser, Dec 02 '14 at 13:36
arbitrary precision integers would be ok how do I use them in c++? — Thijser, Dec 02 '14 at 13:50
If the long represents an elapsed time in a program run in milliseconds, it will be exactly convertible to double. On the other hand, measuring the time in milliseconds introduces measurement error that will dwarf the floating point rounding error if you avoid accumulating unnecessary rounding error. — Patricia Shanahan, Dec 02 '14 at 15:51
May I ask why you need the remainder of the floating point division? This concept [does not make sense mathematically](http://stackoverflow.com/questions/6102948/why-does-modulus-division-only-work-with-integers), but it is an artifact of the limitations of floating point division. — Giovanni Botta, Dec 03 '14 at 01:51

Pascal Cuoq · Accepted Answer · 2014-12-02T13:44:11.637

In a comment below the question:

I would expect to receive the value 1.16 and I get a compilation error on this

You would expect 1.16 because you think you have the value 1.42. You don't. You have the double nearest to 1.42, and while it is close enough to 142/100 (it is exactly 1.4199999999999999289457264239899814128875732421875), subtracting it many times from a large number is going to make a noticeable difference in the end.

In short, there is no way to do what you want (a % b). There is an operation between double, fmod(a, b), which does what you say, but you can only use it if you understand that it applies to double values a and b, which are not represented in decimal internally.

Additional notes:

fmod is exact: the result of the mathematical operation it stands for is always representable as a double, and fmod computes exactly this mathematical result. On the other hand, other floating-point operations are not exact, including conversion from decimal in the case of the decimal representation “1.42”.

fmod(9999999.0, 1.42) is computed exactly as 1.16000000050038210019920370541512966156005859375. In this expression, 9999999.0 represents exactly the value 999999999. The error only comes from the difference between 1.42 and 142/100.

eerorika · Answer 2 · 2014-12-02T14:06:56.930

If you have the choice of using fixed point number for b, then you can get the exact value. If you want to convert the exact result (of the fixed point calculation) to a double, then the accuracy will be limited by the ability of a double to represent the result.

long a_fixed = a * 100; 
long b_fixed = 142; // b * 100
double answer = a_fixed % b_fixed / 100.0;

In the above example, a_fixed % b_fixed is the exact value for (a % b * 100). a must be less than LONG_MAX / 10^2 and the precision of b can be up to 2 decimals. You can reduce the latter limitation by multiplying a and b with a higher power of 10. The former limitation can be avoided by using arbitrary precision integers. An implementation of arbitrary precision integers may even provide an interface for fixed precision arithmetic, allowing you to avoid writing the multiplications in my example and just set the appropriate epsilon.

score 2 · Answer 3 · answered Dec 02 '14 at 17:24

Although there is an fmod function which will yield a precise remainder when dividing two double values, and although one could achieve a precise remainder between a long and a double by splitting the long value into two double values which sum to the original one, using fmod on each, adding the results, and adding or subtracting the divisor, such techniques would only be useful if the divisor is itself precisely representable as a double.

If the divisor is representable as a quotient of two integers (e.g. X/Y) whose product will fit in a long, a more accurate approach would be to compute (((N % X)*Y) % X) / Y. That approach will yield the double value which is closest to the numerically perfect result even if the quotient (X/Y) would not be precisely representable. Note that the first N % X could be simplified to N if N * Y will fit in a long, but the formula as given will work correctly whether or not it can.

score 1 · Answer 4 · answered Dec 03 '14 at 01:40

This is more an extended comment than an answer.

Even if you switched to a data type that could represent all your numbers exactly, you would still have a precision problem due to measurement error, especially using modulo.

The problem is that you are measuring time in milliseconds. It is very unlikely that you have a one kilohertz computer. It is more likely that things are happening at much finer granularity, possibly at the nanosecond level. If you measure by taking the difference between two values of a millisecond clock, you can have up to a millisecond of measurement error. An activity that appears to take 10 ms may have actually taken anywhere from slightly over 9 ms to just under 11 ms, depending on when the start and end time events fell relative to clock ticks.

You can usually control measurement error by making sure your measurements are long compared to the tick length. A duration like 9999999 ms would have about one part in ten million of measurement error, generally not a problem, although it does dwarf double precision conversion rounding error. However, if you subsequently reduce modulo something between one and two, the result is practically meaningless.

Incidentally, for reasonable elapsed times in milliseconds, conversion from long to double is exact. You would need to be measuring thousands of years to get rounding error.

Why the modulo calculation? What is its purpose? Do you reduce modulo such small numbers in the real calculation?

accurate way of getting the modulo of a long and a double in c++

4 Answers4