1

I have an array of ints in C# and I want to get 5% of the whole array, in the way that the new array includes the most frequent similar values. For an example, say I have an array with 100 entries that includes 40 siblings of 20 (15 to 25). What I want is to detect the 20 as the most frequent value (including it's siblings) as a new array and then 5 most frequent values inside the new array. I need to run the code on an ASP.net website and because of that, I need a fast algorithem. Could anyone help me with this please?

Rojan Gh.
  • 1,062
  • 1
  • 9
  • 32
  • 2
    What will your *real* array size be? How clustered will the data be, and do you know the actual ranges? – Jon Skeet Aug 20 '13 at 17:07
  • 1
    Side note: there is no real causation between "run the code on an ASP.net website" and "need a fast algorithm"... It often quite opposite - running slow/complicated code on server... – Alexei Levenkov Aug 20 '13 at 17:13
  • @JonSkeet The array may be as big as 1024*1024 in size and the ranges are between 0 and 255. In fact the array is a byte array. If it is important, I will change my question to fit that. The cluster may be 10. – Rojan Gh. Aug 20 '13 at 17:17
  • @AlexeiLevenkov So what one can do if a website should offer graphic services like setting the theme color of a page using the most frequent color inside a picture used inside that page dynamically? – Rojan Gh. Aug 20 '13 at 17:20
  • I don't know. I think the search term you are looking for is "histogram" which should give you starting point for finding right algorithm. – Alexei Levenkov Aug 20 '13 at 17:30
  • @dasblinkenlight The question says: "...includes 40 'siblings' of 20 (15 to 25)." and by that, I mean something like this {20, 22, 19, 24, 18}. – Rojan Gh. Aug 20 '13 at 17:47

2 Answers2

3

You can build a simple algorithm by grouping the values, ordering by count, and then taking them until you fill the required 5% array, like this:

// Build a set of {Value, Count} pairs using LINQ
var counts = data
    .GroupBy(v => v)
    .Select(g => new {
        Value = g => Key
    ,   Count = g.Count()
    }).OrderByDescending(p => p.Count)
    .Take(5);

EDIT :

The array may be as big as 1024*1024 in size and the ranges are between 0 and 255

Since the range is very small, you can use counting array instead of a group, like this:

int counts = new int[256];
foreach (var b in data) {
    counts[b]++;
}

Now you can run the Quick Select Algorithm to choose the fifth item. Here is an answer that provides a C# implementation of QuickSelect.

var fifth = QuickSelect(counts, 5);
var res = new List<KeyValuePair<int,int>>();
for (int i = 0 ; i != counts.Length && res.Length != 5 ; i++) {
    if (counts[i] >= fifth) {
        res.Add(new KeyValuePair<int,int>(i, counts[i]));
    }
}

You may want to replace the quick select algorithm with the median-of-medians algorithm, which has the same linear performance, but is not randomized.

Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
2
var numbersByOccurrence = from numbers in yourNumberArrayVariable
                          group numbers by numbers into g
                          select new { Number = g.Key, Count = g.Count() };

var limitedSize = numbersByOccurrence.OrderByDescending(n => n.Count).Take(5);

You now have an variable (you can cast as an array or list) of 5 objects with a Number and Count variable you can easily access.

wilso132
  • 829
  • 5
  • 10
  • Thanks for your answer. It is as good as the selected answer except I personally prefer lambda instead of Linq query. – Rojan Gh. Aug 21 '13 at 17:47