0

hi i need to find the biggest dense region in a List of values based on a given range

example:

var radius =5; //Therm edited
var list = new List<int>{0,1,2,3,4,5,12,15,16,22,23,24,26,27,28,29};

//the following dense regions exist in the list above
var region1 = new List<int> { 0, 1, 2, 3, 4, 5 };           // exist 6 times (0, 1, 2, 3, 4, 5)
var region2 = new List<int> { 12, 15, 16};                  // exist 3 times (12, 15, 16)
var region3 = new List<int> { 22, 23, 24, 26, 27};          // exist 1 times (22)
var region4 = new List<int> { 22, 23, 24, 26, 27, 28};      // exist 1 times (23)
var region5 = new List<int> { 22, 23, 24, 26, 27, 28, 29 }; // exist 3 times (24, 26, 27)
var region6 = new List<int> { 23, 24, 26, 27, 28, 29 };     // exist 1 times (28)
var region7 = new List<int> { 24, 26, 27, 28, 29 };         // exist 1 times (29)

//var result{22,23,24,26,27,28,29}

the solution doesn't really need to be fast because the max number of values is 21 is there an way to use fluent to achieve this?

i only know how to get the closest value

int closest = list.Aggregate((x,y) => Math.Abs(x-number) < Math.Abs(y-number) ? x : y);

and how to get values between 2 numbers

var between = list.Where(value=> min < value && value < max);

Edit

additional information's

Ok range is maybe the wrong therm radius would be a better word.

I define the dense region as the largest count of all values between currenvalue-range and currenvalue + range we get the dense region

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
WiiMaxx
  • 5,322
  • 8
  • 51
  • 89
  • What do you mean by *cluster* Is it a set of consequential items which fall in `2 * range` ? – Sergey Berezovskiy Aug 15 '13 at 13:24
  • @JoachimIsaksson the range will be given be an algorithm ther is no need to worry about that ^^ – WiiMaxx Aug 15 '13 at 13:24
  • @lazyberezovsky, so in other words, the largest sequence of numbers that are, in the example, no more than 10 apart? – Mike Perrenoud Aug 15 '13 at 13:25
  • @lazyberezovsky ok range is maybe the wrong therm radius would be a better word – WiiMaxx Aug 15 '13 at 13:25
  • This question is not defined well enough. 'Set' is a sufficient word to describe the result you are after, but again, the question is too poorly defined to provide an answer at this point... – MoonKnight Aug 15 '13 at 13:27
  • Is this the same as http://stackoverflow.com/q/10420464/10396? "Group list of ints by continuous sequence"? – AShelly Aug 15 '13 at 13:28
  • @TheSolution i define the cluster by range (radius) if the count of all values between currenvalue-range and currenvalue + range we get the cluster – WiiMaxx Aug 15 '13 at 13:28
  • @Killercam does may edit help to clear what i'm after? if you think you will be able to define it better i would appreciate if you could take some time to edit my question because my english knowledge is limited and my grammar is very bad – WiiMaxx Aug 15 '13 at 13:38
  • @AShelly yes something like this, but i need all numbers – WiiMaxx Aug 15 '13 at 13:40
  • This is not clustering. You are looking for "dense regions" or something like this. Have a look at **kernel density estimation** (although that is much more advanced than what you are looking for). – Has QUIT--Anony-Mousse Aug 15 '13 at 19:36
  • @Anony-Mousse you are absolutely right i will look at this KDE stuff and thanks for your Edit it's very kind :) – WiiMaxx Aug 16 '13 at 09:11

1 Answers1

4

A rather cryptic (but short) way would be:

int w = 5; // window size
var list = new List<int> { 0, 1, 2, 3, 4, 5, 12, 15, 16, 22, 
                           23, 24, 26, 27, 28, 29 };

var result = list.Select(x => list.Where(y => y >= x - w && y <= x + w))
                 .Aggregate((a, b) => (a.Count() > b.Count()) ? a : b);

Console.WriteLine(string.Join(",", result.ToArray()));

Prints

22,23,24,26,27,28,29

This code consists of 3 steps:

  • For a given x the snippet list.Where(y => y >= x - w && y <= x + w) gives all elements from the list that are in the cluster around x.
  • list.Select(x => ...) computes that cluster for every element of the list.
  • .Aggregate((a, b) => (a.Count() > b.Count()) ? a : b) takes the cluster of maximum size.
Vincent van der Weele
  • 12,927
  • 1
  • 33
  • 61
  • 1
    You can just use `Max` instead of aggregate. It will prevent you from needing to iterate twice as many of the source sequences as you actually need to. Also note this is the brute force method in general; you're computing all values and then finding the best. That's a lot less performant than you could get, so if this needs to scale, it won't. – Servy Aug 15 '13 at 14:29
  • @Servy we cannot use `Max` here directly, because we need to compare the `Count()` values. I agree that this is not an efficient solution, but the OP explicitly stated that efficiency is no issue. – Vincent van der Weele Aug 15 '13 at 14:34
  • @Heuster That is false. It's as simple as `.Max(seq => seq.Count())`. It's dramatically faster, easier to write, read, semantically states what you're trying to do, and is indeed possible. – Servy Aug 15 '13 at 14:35
  • @Servy that is false too. `.Max(seq => seq.Count())` gives the maximum size, not the enumerable having that size. – Vincent van der Weele Aug 15 '13 at 14:40
  • 1
    @Heuster Right, just use `MaxBy` then. Or if you're *really* opposed to doing that (which is by far the best solution) then you can just order by the count (descending) and take the first item. The effort spent doing the unnecessary sort will be far less than the effort spent iterating the complex queries multiple times. – Servy Aug 15 '13 at 14:41