35
double[] someDoubles = { 34.6, 45.1, 55.5, 78.5, 84.66, **1400.32**, 99.04, 103.99 };

This code above is a short-handed sample of an unexpected behavior of an cumulative algorithm (see the bold value). In real, this is a class which also holds a date with each value.

C# Calculate a deviation? Algorithm that sort out the rows who breaks the cumulative chain?

Advices are of help,

[INSERT]

To clarify, this is about three things
Performance is really important on this topic.

First: Fast-Scan if the values follows a cumulative pattern.
Second: Check if all values goes into a reasonable deviation.
Third: Point out and do error handling.

This question is about the first and second.

Independent
  • 2,924
  • 7
  • 29
  • 45
  • I'm not quite clear - are you asking how to calculate the standard deviation or are you asking what is an appropriate measure of dispersion in the data? – Michael J. Barber Mar 17 '11 at 08:56
  • Michael, thats a good question. First: Fast-Scan if the values follows a cumulative pattern. Second: Check if all values goes into a reasonable deviation. Third: Point out and do error handling. – Independent Mar 17 '11 at 09:23

6 Answers6

79

Using LINQ:

double average = someDoubles.Average();
double sumOfSquaresOfDifferences = someDoubles.Select(val => (val - average) * (val - average)).Sum();
double sd = Math.Sqrt(sumOfSquaresOfDifferences / someDoubles.Length); 

The sd variable will have the standard deviation.

If you have a List<double>, then use someDoubles.Count in the last line for code instead of someDoubles.Length.

Camilo Terevinto
  • 31,141
  • 6
  • 88
  • 120
Sanjeevakumar Hiremath
  • 10,985
  • 3
  • 41
  • 46
  • 15
    Because this is a *sample* of a population rather than the whole population, you should perform: `Math.Sqrt(sumOfSquaresOfDifferences / (someDoubles.Length - 1));` (thanks Ella). – jww Apr 08 '14 at 07:11
  • 7
    As `Sum` also takes a predicate, you can go with `double sumOfSquaresOfDifferences = someDoubles.Sum(val => (val - average) * (val - average));` ` – Piotr Zierhoffer Jan 20 '16 at 23:25
18

To calculate standard deviation you can use this code. Taken directly from Calculate Standard Deviation of Double Variables in C# by Victor Chen.

private double getStandardDeviation(List<double> doubleList)  
{  
   double average = doubleList.Average();  
   double sumOfDerivation = 0;  
   foreach (double value in doubleList)  
   {  
      sumOfDerivation += (value) * (value);  
   }  
   double sumOfDerivationAverage = sumOfDerivation / (doubleList.Count - 1);  
   return Math.Sqrt(sumOfDerivationAverage - (average*average));  
}  

This link to Victor's site no longer works, but is still included to help maintain attribution.

Ella Cohen
  • 1,375
  • 1
  • 10
  • 14
jb.
  • 9,921
  • 12
  • 54
  • 90
  • Thank's. I Absolutey like the lambda version, though. This looks little too clunky.. – Independent Mar 17 '11 at 09:43
  • 3
    Better replace string "sumOfDerivation += (value) * (value); " with "sumOfDerivation += (value-average ) * (value-average ); " and last string without "(average*average)" – user1575120 Feb 17 '17 at 08:44
  • 2
    *user1575120* is right. Although mathematically correct, this equation is very much wrong for real-world numerical calculations because it easily overflows. The error is present in *Microsoft Excel* and the *Free Pascal Compiler*. See my [bug report](https://bugs.freepascal.org/view.php?id=32804) about the latter. – Anton Shepelev Dec 23 '17 at 19:18
3

Given the outliers, you might find the interquartile range to be more useful than the standard deviation. This is simple to calculate: just sort the numbers and find the difference of the values at the 75th percentile and the 25th percentile.

Michael J. Barber
  • 24,518
  • 9
  • 68
  • 88
  • I completely agree, passing the upper/lower range values to own List. However this is good when there ARE values falling outside allowed deviation. – Independent Mar 17 '11 at 09:17
3

You already have some good answers on calculating standard deviation, but I'd like to add Knuth's algorithm for calculating variance to the list. Knuth's algo performs the calculation in a single pass over the data. Standard deviation is then just the square root of variance, as pointed out above. Knuth's algorithm also allows you to calculate intermediate values of the variance as you go, if that proves useful.

Re: "Fast-Scan if the values follows a cumulative pattern," if your data is expected to grow linearly, I'd suggest computing a mean and variance for the difference between successive elements (10.5, 10.4 and 23.0 would be the first three difference values from your data). Then find outliers of these difference values instead of the data points. This will make anomalous data values like 1400.32 in your example much more evident, especially when the data eventually grows large enough that 1400 is near the mean.

PaulF
  • 1,133
  • 8
  • 14
2

If you are on .NET 4.0 next links can be helpful
Standard Deviation in LINQ
http://msdn.microsoft.com/en-us/library/dd456873.aspx

Community
  • 1
  • 1
Anton Semenov
  • 6,227
  • 5
  • 41
  • 69
0

In VB.Net, code for Standard Deviation, Z-Score, and NormSDist. I've cut and pasted from working code and modified it to be more generic. I may have introduced issues. Also, I am not a math guy so beware.

Public Property SumOfSquaresOfDifferences As Double ' calculated elsewhere

Public ReadOnly Property StdOfTotalMatches As Double
    Get
        If NumberOfTickets = 0 Then Return 0
        Return Math.Sqrt(SumOfSquaresOfDifferences / NumberOfTickets)
    End Get
End Property

Public ReadOnly Property zScoreOfTotalMatches As Double
    Get
        If StdOfTotalMatches = 0 Then Return 0
        Return (TotalMatches / NumberOfTickets - AverageMatches) / StdOfTotalMatches
    End Get
End Property

Public ReadOnly Property NormSDistOfTotalMatches As Double
    Get
        Return NormSDist(zScoreOfTotalMatches)
    End Get
End Property

Public ReadOnly Property AverageMatches As Double
    Get
        Return If(NumberOfTickets, TotalMatches / NumberOfTickets, 0)
    End Get
End Property

Shared Function NormSDist(ByVal zScore As Double) As Double
    Dim ErfResult As Double = Erf(zScore / Math.Sqrt(2.0))
    Dim res As Double = ErfResult + (1 - ErfResult) / 2
    Return If(zScore < 0, 1 - res, res)
End Function

Shared Function Erf(ByVal n As Double) As Double

    Dim t As Double = 1.0 / (1.0 + 0.5 * Math.Abs(n))

    ' use Horner's method - thanks to http://bytes.com/topic/c-sharp/answers/240995-normal-distribution
    Dim d As Double = 1 - t * Math.Exp(-n * n - 1.26551223 + _
    t * (1.00002368 + _
    t * (0.37409196 + _
    t * (0.09678418 + _
    t * (-0.18628806 + _
    t * (0.27886807 + _
    t * (-1.13520398 + _
    t * (1.48851587 + _
    t * (-0.82215223 + _
    t * (0.17087277))))))))))

    'Return If(d >= 0, d, 1 - d)
    Return d

End Function
BSalita
  • 8,420
  • 10
  • 51
  • 68