-1

I would like to create buckets on List<double> such as divided in n groups such as:

List<double> list = new List<double>() { 
  0, 0.1, 1.1, 2.2, 3.3, 4.1, 5.6, 6.3, 7.1, 8.9, 9.8, 9.9, 10 
};

n = 5

I want to obtain something like this

  bucket     values
---------------------------------
[0 ..  2] -> {0, 0.1, 1.1}
[2 ..  4] -> {2.2, 3.3}
...
[8 .. 10] -> {8.9, 9.8, 9.9, 10} 

The problem is if I do a GroupBy using:

return items
    .Select((item, inx) => new { item, inx })
    .GroupBy(x => Math.Floor(x.item / step))
    .Select(g => g.Select(x => x.item));

I always get unwanted first or last bucket such as [10 .. 12] (note that all the values are in [0 .. 10] range) or [0 .. 0] (note the wrong range of the bucket) which contains extreme values only (0 or 10 in the example above).

any Help ?

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
Skander
  • 77
  • 6
  • 3
    So you have a list *and* a dictionary? How are these two things related? What is `step`? I don't understand this question at all. – John Wu Mar 16 '21 at 15:40
  • 1
    could you please be more elaborate what you mean by "extreme values"? if you have any bugs that only happen in edge cases, you _should_ include at least one of those edge cases in your question. additionally, your question is pretty unclear. please try to bring more detail in your description – Franz Gleichmann Mar 16 '21 at 15:41
  • With what input do you get an unexpected output? What is that output and what did you expect? – Magnus Mar 16 '21 at 15:43
  • Are you looking for cluster ?https://en.wikipedia.org/wiki/K-means_clustering. Can you covers the basic edge case with your [mre]? Do we groupe {3.3, 4.1} because there is less than 1 between those 2? – Drag and Drop Mar 16 '21 at 15:45

1 Answers1

2

Well, for arbitrary list you have to compute range: [min..max] and then

  step = (max - min) / 2;

Code:

  // Given

  List<double> list = new List<double>() {
    0, 0.1, 1.1, 2.2, 3.3, 4.1, 5.6, 6.3, 7.1, 8.9, 9.8, 9.9, 10
  };

  int n = 5; 

  // We compute step

  double min = list.Min();
  double max = list.Max();

  double step = (max - min) / 5;

  // And, finally, group by:

  double[][] result = list
    .GroupBy(item => (int)Math.Clamp((item - min) / step, 0, n - 1))
    .OrderBy(group => group.Key)
    .Select(group => group.ToArray())
    .ToArray();

  // Let's have a look:

  string report = string.Join(Environment.NewLine, result
    .Select((array, i) => $"[{min + i * step} .. {min + i * step + step,2}) : {{{string.Join("; ", array)}}}"));

  Console.WriteLine(report);

Outcome:

[0 ..  2) : {0; 0.1; 1.1}
[2 ..  4) : {2.2; 3.3}
[4 ..  6) : {4.1; 5.6}
[6 ..  8) : {6.3; 7.1}
[8 .. 10) : {8.9; 9.8; 9.9; 10}

Please, note Math.Clamp method to ensure [0..n-1] range for groups keys. If you want a Dictionary<int, double[]> where Key is index of bucket:

  Dictionary<int, double[]> buckets = list
    .GroupBy(item => (int)Math.Clamp((item - min) / step, 0, n - 1))
    .ToDictionary(group => group.Key, group => group.ToArray());
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215