What is more accurate way to average, ARR[0]/N+ARR[1]/N...+ARR[N-1]/N or (ARR[0]+ARR[1]...+ARR[N-1])/N in double?

Question

What is more accurate way to calculate average of set of numbers, ARR[0]/N+ARR[1]/N...+ARR[N-1]/N or (ARR[0]+ARR[1]...+ARR[N-1])/N? (ARR is the set of numbers and N is the count of the numbers in that set)

Consider I have set of numbers that each ranges from 0.0 to 1.0 (they are double\floating-point numbers) and there are thousands of them or even millions.

I am open to new methods like recursive average (average twin-cells into array and then again average it until it outputs one-cell array).

(Assuming all numbers are positive) Sort the numbers, lowest to highest, and then add, lowest to highest. (If negative numbers present, sort by absolute value.) And no need to divide each element by N, just divide the sum by N. — Hot Licks, Sep 28 '14 at 18:52
See also http://stackoverflow.com/q/13417670/ (in particular, Kahan summation) — Nemo, Sep 28 '14 at 19:01

rcgldr · Accepted Answer · 2014-09-28T19:31:05.713

2

If the values near zero are very close to zero, you'll have an issue with rounding (could be rounding error up or down) in a summation, or any range of numbers if summing a large set of numbers. One way around this issue is to use a summation function that only adds numbers with the same exponent (until you call getsum() to get the total sum, where it keeps exponents as close as possible). Example C++ class to do this (note code was compiled using Visual Studio, written before uint64_t was available).

//  SUM contains an array of 2048 IEEE 754 doubles, indexed by exponent,
//  used to minimize rounding / truncation issues when doing
//  a large number of summations

class SUM{
    double asum[2048];
public:
    SUM(){for(int i = 0; i < 2048; i++)asum[i] = 0.;}
    void clear(){for(int i = 0; i < 2048; i++)asum[i] = 0.;}
//  getsum returns the current sum of the array
    double getsum(){double d = 0.; for(int i = 0; i < 2048; i++)d += asum[i];
                    return(d);}
    void addnum(double);
};

void SUM::addnum(double d)      // add a number into the array
{
size_t i;

    while(1){
//      i = exponent of d
        i = ((size_t)((*(unsigned long long *)&d)>>52))&0x7ff;
        if(i == 0x7ff){         // max exponent, could be overflow
            asum[i] += d;
            return;
        }
        if(asum[i] == 0.){      // if empty slot store d
            asum[i] = d;
            return;
        }
        d += asum[i];           // else add slot to d, clear slot
        asum[i] = 0.;           // and continue until empty slot
    }
}

Example program that uses the sum class:

#include <iostream>
#include <iomanip>
using namespace std;

static SUM sum;

int main()
{
double dsum = 0.;
double d = 1./5.;
unsigned long i;

    for(i = 0; i < 0xffffffffUL; i++){
        sum.addnum(d);
        dsum += d;
    }
    cout << "dsum             = " << setprecision(16) << dsum << endl;
    cout << "sum.getsum()     = " << setprecision(16) << sum.getsum() << endl;
    cout << "0xffffffff * 1/5 = " << setprecision(16) << d * (double)0xffffffffUL << endl;

    return(0);
}

edited Sep 28 '14 at 19:31

answered Sep 28 '14 at 18:33

rcgldr

27,407
3
36
61

it's not clear that this is provably the best way to add a bunch of numbers. do you know if that is true? if so, can you offer a simple sketch of a proof? – thang Sep 28 '14 at 18:37
@thang: this is not the best way. But its the best reasonable way I can think of. The lowest error way of summing numbers to eliminate truncation of the small quantities would be to sort the numbers and to sum them from smallest to largest. – Rafael Baptista Sep 28 '14 at 18:42
@RafaelBaptista - summing a set of sorted numbers doesn't solve the problem. You'd still have the potential issue of summing a large set of numbers with the same exponent. Near the end of this sequence you're adding a relatively small number to a relatively large working sum. – rcgldr Sep 28 '14 at 18:53
well, we know what the theoretical best case scenario is that the error is equal to zero accumulation error. i.e. if you add all the items and divide assuming infinite precision, then the error due to fitting the result into the representation gives you the best case scenario. the question is: does this give you that? intuition suggests that it may, which is quite clever, but i'd probably want to see a brief sketch of a proof to be "reliable". – thang Sep 28 '14 at 18:58
You're right. the Sorting solution can break. In the case of 8-bit mantissa, adding 1000 1's will not sum to 1000. It will sum to like 255 or something like that. – Rafael Baptista Sep 28 '14 at 19:00
@thang I understood this code (And had nerdgasm) everytime one cell is unable to handle it's number, it will just lose of a bit of it's information. It would be good if you could save that bit in the previous cell. (It will be perfect then) – KugBuBu Sep 28 '14 at 19:00
@thang - I added an example main program that uses the sum class to show the difference between a conventional sum versus the class sum to perform the equivalent of (1./5.) * 4294967295. – rcgldr Sep 28 '14 at 19:01
@KugBuBu, if i read it correctly (and understand your statement correctly), it looks like it does propagate up as the exponents change (d += asum[i]; asum[i] = 0.;) until it can't propagate anymore. – thang Sep 28 '14 at 19:04
@thang In `10000+0.1` you lose 17~ `log[2](10000/0.1)` bits of information each time you add new number. I think I will try to fix it later (Or rcgldr if s\he can) the lose of one bit of information. (That is not each time) – KugBuBu Sep 28 '14 at 19:09
@KugBuBu - every time you add two doubles with the same exponent, the exponent increases by 1, and there is rounding into the least significant bit in the new mantissa (significand), but it remains a 53 bit mantissa. – rcgldr Sep 28 '14 at 19:20
@rcgldr Forgot that the mantissa is larger than 0.5. (Unless the actual number is zero) Maybe later I will try to keep that bit somewhere and edit the post. (If you can I will really appreciate it) – KugBuBu Sep 28 '14 at 19:27
To understand the propagation, imagine that the number 1 is being summed n times, then array[1023+i] = 1.0 x pow(2,i). So 1+1 is stored at [1024] (and [1023] zeroed), (1+1)+(1+1) is stored at [1025] (and [1024] zeroed), (1+1+1+1)+(1+1+1+1) is stored at [1026] (so [1026] is only updated once every 8 calls). – rcgldr Sep 28 '14 at 19:27

score 0 · Answer 2 · answered Sep 28 '14 at 18:26

0

(ARR[0]+ARR[1]...+ARR[N-1])/N is faster and more accurate because you omit useless divisions with N that both slow down the process and add error in the calculations.

answered Sep 28 '14 at 18:26

101010

41,839
11
94
168

Make sure you are using some good summation algorithm, like [this](http://en.wikipedia.org/wiki/Kahan_summation_algorithm) – Anton Savin Sep 28 '14 at 18:30
Why division for each value is making error in calculation? I thought that when you add two values for example `10000` and `0.1` the `0.1` rounds with really big error because it's very small. I need reliable source for this. – KugBuBu Sep 28 '14 at 18:33
As a tip, sort by magnitude before adding them, and start on the low end. (Does not matter if the numbers can be added exactly without error anyway) – Deduplicator Sep 28 '14 at 18:36
i think that it doesn't make much of a difference in precision between the two ways of doing things. the main concern is the large number of adds. whether you divide before or after doesn't matter much, unless N is so large that it causes the exponent to saturate. – thang Sep 28 '14 at 18:39

score 0 · Answer 3 · answered Sep 28 '14 at 18:48

If you have a bunch of floating-point numbers, the most accurate way to get the mean is like this:

template<class T> T mean(T* arr, size_t N) {
    std::sort(+arr, arr+N, [](T a, T b){return std::abs(a) < std::abs(b);});
    T r = 0;
    for(size_t n = 0; n < N; n++)
        r += arr[n];
    return r / N;
}

Important points:

The numbers of least magnitude are added first to preserve significant digits.
Only one division, to reduce rounding error there.

Still, the intermediate sum might become too big.

What is more accurate way to average, ARR[0]/N+ARR[1]/N...+ARR[N-1]/N or (ARR[0]+ARR[1]...+ARR[N-1])/N in double?

3 Answers3