26

I have a collection of person objects (IEnumerable) and each person has an age property.

I want to generate stats on the collection such as Max, Min, Average, Median, etc on this age property.

What is the most elegant way of doing this using LINQ?

lost_in_the_source
  • 10,998
  • 9
  • 46
  • 75
leora
  • 188,729
  • 360
  • 878
  • 1,366
  • 1
    Be careful if the data is comming from a database, as Linq may read the data more then once, however in your case Linq should be a good tool as you seem to have your collection in ram. – Ian Ringrose Nov 09 '10 at 14:14

5 Answers5

45

Here is a complete, generic implementation of Median that properly handles empty collections and nullable types. It is LINQ-friendly in the style of Enumerable.Average, for example:

    double? medianAge = people.Median(p => p.Age);

This implementation returns null when there are no non-null values in the collection, but if you don't like the nullable return type, you could easily change it to throw an exception instead.

public static double? Median<TColl, TValue>(
    this IEnumerable<TColl> source,
    Func<TColl, TValue>     selector)
{
    return source.Select<TColl, TValue>(selector).Median();
}

public static double? Median<T>(
    this IEnumerable<T> source)
{
    if(Nullable.GetUnderlyingType(typeof(T)) != null)
        source = source.Where(x => x != null);

    int count = source.Count();
    if(count == 0)
        return null;

    source = source.OrderBy(n => n);

    int midpoint = count / 2;
    if(count % 2 == 0)
        return (Convert.ToDouble(source.ElementAt(midpoint - 1)) + Convert.ToDouble(source.ElementAt(midpoint))) / 2.0;
    else
        return Convert.ToDouble(source.ElementAt(midpoint));
}
Rand Scullard
  • 3,145
  • 1
  • 22
  • 18
  • This, AFAIK, enumerates the source 2 or 3 times: first when `Count()` is called, second (and possibly third time) - when ElementAt is called. – DarkWanderer May 24 '14 at 09:51
  • 10
    You're absolutely right. And -- as always with LINQ -- the impact of this will range from trivial to prohibitive, depending on the nature of your collection. Remember the Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. (See http://blog.codinghorror.com/why-arent-my-optimizations-optimizing/) – Rand Scullard May 24 '14 at 14:22
32
var max = persons.Max(p => p.age);
var min = persons.Min(p => p.age);
var average = persons.Average(p => p.age);

Fix for median in case of even number of elements

int count = persons.Count();
var orderedPersons = persons.OrderBy(p => p.age);
float median = orderedPersons.ElementAt(count/2).age + orderedPersons.ElementAt((count-1)/2).age;
median /= 2;
Itay Karo
  • 17,924
  • 4
  • 40
  • 58
  • 1
    Little note here, that this algorithm is not time-optimal. Median can be calculated in O(n) time. – Max May 13 '17 at 11:09
  • 6
    @Max Where is your time-optimal solution? – crush Aug 28 '19 at 19:23
  • Note that this method may give wrong results if the list contains null values. One needs to filter out those null values before using it! – draz Nov 03 '22 at 21:42
14

Max, Min, Average are part of Linq:

int[] ints = new int[]{3,4,5};
Console.WriteLine(ints.Max());
Console.WriteLine(ints.Min());
Console.WriteLine(ints.Average());

Median is easy:

UPDATE

I have added order:

ints.OrderBy(x=>x).Skip(ints.Count()/2).First();

BEWARE

All these operations are done in a loop. For example, ints.Count() is a loop so if you already get ints.Length and stored to a variable or simply use it as it is, would be better.

Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • 1
    `ints.ElementAt(ints.Count()/2)` – Cheng Chen Nov 09 '10 at 13:58
  • for the Median to be right you need to order the array first. – Itay Karo Nov 09 '10 at 13:59
  • 1
    Your implementation of `Median` also assumes that the input has an odd number of elements. It will fail for `{0, 1}` (it will give 0 instead of 0.5). – Mark Byers Nov 09 '10 at 14:00
  • 1
    @Mark - AFAIK Median will always return one of the source array elements – Itay Karo Nov 09 '10 at 14:04
  • 4
    @Itay: No, that's usually called the medoid. *A related concept, in which the outcome is forced to correspond to a member of the sample is the medoid.* http://en.wikipedia.org/wiki/Median The way you are calculating the medoid is biased as it will often return a value that is lower than the median, and never a value that is higher. – Mark Byers Nov 09 '10 at 14:04
7

Get median using Linq (works for even or odd number of elements)

int count = persons.Count();

if (count % 2 == 0)
    var median = persons.Select(x => x.Age).OrderBy(x => x).Skip((count / 2) - 1).Take(2).Average();
else
    var median = persons.Select(x => x.Age).OrderBy(x => x).ElementAt(count / 2);
Josh Withee
  • 9,922
  • 3
  • 44
  • 62
1

My Median LINQ extension, using Generic Math (.NET 7+)

public static TSource Median<TSource>(this IEnumerable<TSource> source)
    where TSource : struct, INumber<TSource>
    => Median<TSource, TSource>(source);

public static TResult Median<TSource, TResult>(this IEnumerable<TSource> source)
    where TSource : struct, INumber<TSource>
    where TResult : struct, INumber<TResult>
{
    var array = source.ToArray();
    var length = array.Length;
    if (length == 0)
    {
        throw new InvalidOperationException("Sequence contains no elements.");
    }
    Array.Sort(array);
    var index = length / 2;
    var value = TResult.CreateChecked(array[index]);
    if (length % 2 == 1)
    {
        return value;
    }
    var sum = value + TResult.CreateChecked(array[index - 1]);
    return sum / TResult.CreateChecked(2);
}
marsze
  • 15,079
  • 5
  • 45
  • 61