1

I'm trying to figure out the best way to perform a computation fast and wanted to find out what sort of approach people would usually take in a situation like this.

I have a List of objects which have properties that I want to compute the mean and standard deviation of. I thought using this Math.NET library would probably be easier/optimised for performance.

Unfortunately, the input arguments for these functions are arrays. Is my only solution to write my own function to compute means and STDs? Could I write some sort of extension method for lists that uses lambda functions like here? Or am I better off writing functions that return arrays of my object properties and use these with Math.NET.

Presumably the answer depends on some things like the size of the list? Let's say for argument's sake that the list has 50 elements. My concern is purely performance.

TylerH
  • 20,799
  • 66
  • 75
  • 101
rex
  • 3,133
  • 6
  • 35
  • 62
  • If your concern is performance, you'd rather implement your own function; you can compute both mean and standard deviation in one loop. Linq List.ToArray() is not a good approach in this particular case – Dmitry Bychenko Feb 25 '14 at 11:36

2 Answers2

5

ArrayStatistics indeed expects arrays as it is optimized for this special case (that's why it is called ArrayStatistics). Similarly, StreamingStatistics is optimized for IEnumerable sequence streaming without keeping data in memory. The general class that works with all kind of input is the Statistics class.

Have you verified that simply using LINQ and StreamingStatistics is not fast enough in your use case? Computing these statistics for a list of merely 50 entries is barely measurable at all, unless say you do that a million times in a loop.

Example with Math.NET Numerics v3.0.0-alpha7, using Tuples in a list to emulate your custom types:

using MathNet.Numerics.Statistics;

var data = new List<Tuple<string, double>>
{
    Tuple.Create("A", 1.0),
    Tuple.Create("B", 2.0),
    Tuple.Create("C", 1.5)
};

// using the normal extension methods within `Statistics`
var stdDev1 = data.Select(x => x.Item2).StandardDeviation();
var mean1 = data.Select(x => x.Item2).Mean();

// single pass variant (unfortunately there's no single pass MeanStdDev yet):
var meanVar2 = data.Select(x => x.Item2).MeanVariance();
var mean2 = meanVar2.Item1;
var stdDev2 = Math.Sqrt(meanVar2.Item2);

// directly using the `StreamingStatistics` class:
StreamingStatistics.MeanVariance(data.Select(x => x.Item2));
Christoph Rüegg
  • 4,626
  • 1
  • 20
  • 34
  • Hi, thanks for your answer. Lets say I would like to find mean of 50 items every 20ms, and will be doing this for a long time, I' guessing StreamingStatistics is the best option? – rex Feb 26 '14 at 17:05
  • StreamingStatistics.Mean(myList.GetRange(myList.Count() - 51 , 50).AsEnumerable().Select(x => x.Property1)); This is how I have implemented it, since i want the average of the last 50 elements; would you say I'm getting a significant performance hit from using .GetRange() ? – rex Feb 26 '14 at 17:15
1

The eaisiest solution you can use is to put Linq so that transform List to array

  List<SomeClass> list = ...

  GetMeanAndStdError(list.ToArray()); // <- Not that good performance

However, if perforamance is your concern, you'd rather compute Mean and Variance explicitly (write your own function):

  List<SomeClass> list = ...

  Double sumX = 0.0;
  Double sumXX = 0.0;

  foreach (var item in list) {
    Double x = item.SomeProperty;

    sumX += x;
    sumXX += x * x;
  }

  Double mean = sumX / list.Count;
  Double variance = (sumXX / list.Count - mean);
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215