2

using 32bit floating point values, what's the best (numerically most accurate) way of computing the average if - when starting the computation - I don't know yet how many values I gonna have (in the following examples I just iterate over a vector so I would know the coult, but let's suppose I'd know the element count only in the end)?

I could do for example

float result = 0.f;
for(float num: numbers) {
    result += num;
}
num /= numbers.size();

but as result grows larger, so does the precision. With small values, at some point result += num; will not actually change result anymore.

I could do

float result = numbers[0]
for(int i=1, i<numbers.size(); i++) {
    float frac = (i/float(i+1));
    result = result * frac + numbers[i] * (1.0f-frac);
}

but it seems I'd apply a cummulative error to result that way.

Is there a better way without going to 64bit double?

matthias_buehlmann
  • 4,641
  • 6
  • 34
  • 76

1 Answers1

5

The best known method for this sort of problems is the Kahan Summation. See here: https://en.wikipedia.org/wiki/Kahan_summation_algorithm. Assuming the sum remains representable as a single-precision float, do a straight divide at the end to find the average.

Also see this answer for some extra discussion, which is asking for more or less the same: How to compute the average of doubles, so that the total error is minimal?

alias
  • 28,120
  • 2
  • 23
  • 40
  • 3
    That’s a known method, not the best known method. The numerically best method for summing IEEE-754 binary64 numbers is to maintain an array of 2098 bits and add each value into them in the appropriate position. That has no error, since the binary64 format spans just 2098 bits (2\*\*1023 to 2\*\*-1074). The best method for performance is to do no operations, maintain no data, and return zero. Everything is a trade-off. – Eric Postpischil Jan 28 '20 at 18:21
  • Kahan is for the Sum. However, if I'm not interested in the Sum but only the average, I think that's a different usecase (since the average is usually much smaller than the sum) – matthias_buehlmann Jan 28 '20 at 18:21
  • @EricPostpischil Good point! But OP's wording suggests she/he doesn't event want to go to 64-bit doubles; so I'm assuming 2098 is beyond what they can afford. – alias Jan 28 '20 at 18:29
  • @user1282931 I don't think there's any obvious trick to make average computation any more precise within your bounds. Going through the sum is your best bet. But I'd love to hear if there was some other method for averages. (Or other statistics for that matter.) – alias Jan 28 '20 at 18:31
  • 5
    @alias The following works "well enough" without explicitly forming the sum (which I agree is best for a few thousand inputs as specified by OP): `mean = 0.0f; for (int i = 0; i < N; i++) { mean = fmaf (num[i] - mean, 1.0f/(i+1), mean); }`. There is literature from the 1960s onward on how to compute mean, standard deviation, and variance in a single pass numerically sound. One recent paper: J. Bennett, R. Grout, P. Pébay, D. Roe, D. Thompson: "Numerically stable, single-pass, parallel statistics algorithms". In *2009 IEEE International Conference on Cluster Computing and Workshops*, pp. 1-8 – njuffa Jan 28 '20 at 19:18
  • 1
    @njuffa Cool! You should post this comment as an answer! – alias Jan 28 '20 at 19:30
  • @EricPostpischil Technically, 2098 bits may not suffice (if you have multiple values near `MAX_DOUBLE`). I totally like the "best performance" method though ;) – chtz Feb 03 '20 at 08:31