Avoid float rounding error when calculating average value

Question

I'm writing a simple extension that calculates an average for an array. It works fine except when values are very big. So here is an example

const int div = 100;
double num = 0;
for (int i = 0; i < div; i++)
{
    num += double.MaxValue/div;
}
Console.WriteLine(num);
Console.WriteLine(double.MaxValue);

I expect to get double.MaxValue but i get Infinity because of rounding error. Is it possible to change an algorithm or handle this situation? I know that there is some techniques to work with floats (rounding to even, for example), but I'm looking for something that could be helpful in this very case. I hope an answer isn't No, you cannot, just humble yourself, you have no chance when you work with floats

That's probably a rounding error due to the addition. Keep in mind that clients using this method are supposed to know about floating point math. AFAIK that's just something you cannot work around (apart from using `decimal`). In your position I would leave the method as it is. — Good Night Nerd Pride, Feb 17 '16 at 12:00
@CodyGray it's not true. I do `(something/100) * 100` and check if I get `something`. But for maximum value I get infinity because it leaks `a bit` but it's enough to become an infinity. For example if I sum 99 times instead of 100 and then do `num += double.MaxValue/100*0.9999999999999;` I get a valid value. If i add another `9` to this I get an infinity. So relative error is really small, but it's enough. As the saying goes, `there are only 2 smalls numbers: epsilon and delta, all other numbers are big` — Alex Zhukovskiy, Feb 17 '16 at 12:15
Sorry, I think I was confused by the question. You appear to be adding values up in a loop, but the values aren't actually changing. You're just adding `double.MaxValue / 100` 100 times. — Cody Gray - on strike, Feb 17 '16 at 12:18
@CodyGray I'm not, I'm adding `double.MaxValue/100` 100 times. — Alex Zhukovskiy, Feb 17 '16 at 12:19
What is the practical reason for averaging 100 numbers around 10^306? — Patricia Shanahan, Feb 18 '16 at 03:39
@PatriciaShanahan I'm writing a library so I shouldn't do any assumptions about the usage, I should only care about valid results. I expected that there could be some algorithm which handles such cases. — Alex Zhukovskiy, Feb 18 '16 at 09:08
If you are going to use a given data type, you do have to assume, and document your assumption, that values are within the range that can be safely handled by that data type. There are exceptions, such as some probability calculations that can underflow unless e.g. done in logarithms, but the double range is wide enough for most purposes. — Patricia Shanahan, Feb 18 '16 at 17:06

score 1 · Accepted Answer · edited May 23 '17 at 12:24

One possibility would be to not divide each term by number of items, but divide the sum once afterward, in order to accumulate less error...

Of course MaxValue+MaxValue would overflow in this case. Yes, but your library would cleverly mitigate this problem by detecting overflow (arrange to trap the exception), and scale the operands by 1/2 in this case.
At the end after the division, you would apply inverse scale by appropriate power of 2.

Yes but the sum of MaxValue,3,-MaxValue might be exhibit very bad accuracy (like answering zero instead of 1).

Ah, no problem, you can have a perfectly exact sum with code like this Precise sum of floating point numbers, that's easy, modulo the scale protection you'll have to mix with...

A small collateral effect is that your average is no more O(n) just a bit more exensive (hem...). Oh bad luck, some real-time application might expect O(n), and since you're writing a general purpose library...

So, in order to balance both expectations, you might choose to sum with kind of Kahan sum, and sacrifice some accuracy for speed...

Oh but why? What are the expectations exactly? This is the main question you'd have to answer... Do you prefer a library that guaranty best accuracy possible at the price of speed (think of crlibm vs libm), best speed at the price of a few corner cases, exception free behaviour whatever the illness of inputs (your original question), or a mix of above?

Unfortunately, you can't have them all together...

In all case, as Patricia said, document it.

I think i Can ahve them all together. Whan am I doing now: I'm just summing all values. If I get finite value, i just divide it on `array.Length` and it's over. If it isn't, i just do a rollback, and sum each single value divided on `array.Length`. Is some cases I can get infinity again, but it's ok: with a code proided (i saw the link, it looks interesting) i can do another rollback when second approach fails. More expensive calculations are performed if and only if the faster one fails, so performance should be OK while calculations are accurates. — Alex Zhukovskiy, Feb 19 '16 at 08:37

score -2 · Answer 2 · answered Feb 17 '16 at 11:45

-2

In loop you are dividing it with zero and that's why the value of num is Infinite.

answered Feb 17 '16 at 11:45

Mihir Solanki

333
1
6

2

Sorry, but that's just not true. There never is a division by zero. – Good Night Nerd Pride Feb 17 '16 at 11:54

Avoid float rounding error when calculating average value

2 Answers2