1

I have used the 'mean' function on GSL which proved to be more accurate than my naive implementation. I haven't found a 'sum' function though, I'm using 'mean'*N instead, but I think it would be cleaner if I use a proper summing function.

I'm summing a huge quantity of numbers, I and was wondering this to avoid if possible implementing a stable summing algorithm.

Thanks in advance.

Bunder
  • 233
  • 1
  • 8

2 Answers2

1

One sometimes-used trick is to use cblas_ddot function and compute the dot product of your data with a vector of ones. That will effectively compute the sum of your data.

damienfrancois
  • 52,978
  • 9
  • 96
  • 110
1

Short answer: better summing method is Kahan summation algorithm. This answer corrected stated that

" It has the same algorithmic complexity as a naive summation ; It will greatly increase the accuracy of a summation. ", and also gave an implementation in C++.

Kahan summation is only necessary if the elements of your array vastly differ in magnitude or if you really need the 16 digits of precision that double in principle can offer (rare situation).

So, before coding kahan summation in C you should do some checks. Given the GSL implementation of gsl_stats_mean is

(GSL 1.16 source code)

  /* Compute the arithmetic mean of a dataset using the recurrence relation 
     mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1)   */

  long double mean = 0;
  size_t i;

  for (i = 0; i < size; i++)
  {
    mean += (data[i * stride] - mean) / (i + 1);
  }

I cannot immediately see that this would avoid lost of precision if your numbers are indeed vastly different in magnitude (there is a direct sum between your highly variable numbers and the mean, which evolves slowly in magnitude.). A good check is to sort your array before calculating the sum/mean using your naive implementation/gsl.

Edit 1 : Warning, c = (t - sum) - y may be optimized to c = 0 if optimization is turn on.

Community
  • 1
  • 1
Vivian Miranda
  • 2,467
  • 1
  • 17
  • 27