I often process some data by programs where by data. To make is simple let us consider that data is a single series of numbers of same magnitude. When the numbers are unreasonably high it might be useful to normalize the data. One of common transformations is substracting the average from all values. After this transformation the transformed data will have the average zero.
Other common transformation which can be done after having zero average is dividing the data by their standard deviation. After appling this transformation the new data have unit variance.
When working with data normalized this way I expect that numerical errors should be smaller. However I seem to fail to do these transformations because the numerical errors appear even when I am trying to compute standard deviation.
Bellow is sample code in c# where I try to compute standard deviation. It can be easily seen even without statistical knowledge (of the formula) that the output of the program should be zero. (If data is array of constants then average of squares of data equals to square of averages.)
static double standardDeviation(double[] data)
{
double sum = 0;
double sumOfSquares = 0;
foreach (double number in data)
{
sum += number;
sumOfSquares += number * number;
}
double average = sum / data.Length;
double averageOfSquares = sumOfSquares / data.Length;
return Math.Sqrt(averageOfSquares - average * average);
}
static void Main(string[] args)
{
double bigNumber = 1478340000000;
double[] data = Enumerable.Repeat(bigNumber, 83283).ToArray();
Console.WriteLine(standardDeviation(data));
}
Instead of zero the program outputs a huge number caused by numerical errors: 2133383.0308878
Note that if I would omit Math.Sqrt (i.e. I would be computing variance instead of standard deviation) then the error would be much higher.
What is the cause and how do I write this with smaler numerical errors?