5

I have an integer and a list of non-negative weights, how can I 'split' the integer into same number of 'buckets' with corresponding weights?

public int[] SplitIntoBuckets(int count, int[] weights) {
    // some magic algorithm
    Debug.Assert(solution.Sum() == count);
    return solution;
}

A trivial example would be count = 200 and weights = { 25, 25, 50 } with solution {50, 50, 100} (50+50+100 = 200). The inputs, however, does not have to be 'nice' numbers, another example without nice solution would be count = 150 and weights {753, 42, 95, 501}.
The sum of buckets must be always equal to the count input, the algorithm should distribute the input among buckets as closely to weights as possible. What is as 'close as possible' does not matter (for example it could be either lowest absolute, relative or squared error).
The closest questions I could find are Split evenly into buckets, however my buckets are not 'even' and Split randomly into buckets however the weights are chosen randomly to be 'nice' numbers.

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
wondra
  • 3,271
  • 3
  • 29
  • 48

3 Answers3

3

I suggest rounding while tracking difference (diff) between exact double value (v) and rounded integer one (value):

public static int[] SplitIntoBuckets(int count, int[] weights) {
  if (null == weights)
    throw new ArgumentNullException(nameof(weights));
  else if (weights.Length <= 0)
    return new ArgumentOutOfRangeException(nameof(weights), "Empty weights");  

  double sum = weights.Sum(d => (double)d);

  if (sum == 0)
    throw new ArgumentException("Weights must not sum to 0", nameof(weights));

  Func<double, int> round = (double x) => (int)(x >= 0 ? x + 0.5 : x - 0.5);

  int[] result = new int[weights.Length];
  double diff = 0;

  for (int i = 0; i < weights.Length; ++i) {
    double v = count * (double)(weights[i]) / sum;
    int value = round(v);
    diff += v - value;

    if (diff >= 0.5) {
      value += 1;
      diff -= 1;
    }
    else if (diff <= -0.5) {
      value -= 1;
      diff += 1;
    }

    result[i] = value;
  }
    
  return result;
}

Demo:

string demo = sstring.Join(Environment.NewLine, Enumerable
  .Range(200, 15)
  .Select(n => $"{n} = {string.Join(", ", SplitIntoBuckets(n, new int[] { 25, 25, 50 }))}"));

Console.Write(demo);
    

Outcome:

200 = 50, 50, 100
201 = 50, 51, 100
202 = 51, 50, 101
203 = 51, 51, 101
204 = 51, 51, 102
205 = 51, 52, 102
206 = 52, 51, 103
207 = 52, 52, 103
208 = 52, 52, 104
209 = 52, 53, 104
210 = 53, 52, 105
211 = 53, 53, 105
212 = 53, 53, 106
213 = 53, 54, 106
214 = 54, 53, 107
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • I am sorry, I did not mention that all integers should be non-negative (total, weights, solution). Based on your solution, does following modification `if (Math.Abs(diff) >= 0.5) { var correction = (int)Math.Round(diff); if (value + correction >= 0) { value += correction; diff -= correction;}` give the similar quality of results while avoiding negative values in solutions? – wondra Jul 15 '20 at 13:33
  • @wondra: if both `total` and `weights` are non-negative then `solution` will be non-negative as well; you, probably, suspect `else if (diff <= -0.5) { value -= 1; diff += 1; }` fragment; please, note, that `diff <= -0.5` when `value` overestimated, i.e. we have `{0.6, 0.6, 0.7, ...}` floating point exact solution and we have `value == 1, diff = -0.4` for the first item, `value ==1, diff = -0.8` for the second one which we should correct: `value == 0, diff = 0.2` etc. so we have `{1, 0, ...}`. Note that we round `v` **up** (add `1`) and then repent and subtract `1` – Dmitry Bychenko Jul 15 '20 at 16:56
  • @wondra: I doubt if you want such modification: 1. it less readable 2. I don't believe in `if (value + correction >= 0)` - if `value == 0` then `correction >= 0` (`value` is an estimation of `v`, since `v >= 0`, `correction` can't be negative that aggravates underestimation `value <= v`) – Dmitry Bychenko Jul 15 '20 at 17:02
  • Thanks for the explanation, I think I understand it now better - `round` is actually Math.Ceiling which makes more sense now. – wondra Jul 16 '20 at 08:38
0

Notice that solution[i] is equal to:

round(weights[i] / weightSum * count)

There is an edge case where weights[i] / weightSum * count is an odd multiple of a half (x.5), which causes round to round up an extra time unnecessarily. An example of this would be count = 3, weights = { 1, 1 }. To counter this, we calculate the last bucket by subtracting the sum of the previous buckets from count. This would ensure the solution to add up to count no matter what.

public int[] SplitIntoBuckets(int count, int[] weights) {
    int[] solution = new int[weights.Length];
    int weightSum = weights.Sum();
    // for every weight except the last...
    int sum = 0;
    for (int i = 0 ; i < weights.Length - 1 ; i++) {
        solution[i] = (int)Math.Round((double)weights[i] / weightSum * count);
        sum += solution[i];
    }
    // calculate the last bucket by subtracting:
    solution[weights.Length - 1] = count - sum;
    return solution;
}
Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • 1
    Damping all rounding errors into the last item can lead to non closely distributed solutions, e.g. `SplitIntoBuckets(202, new int[] {50, 50, 100})` returns `{50, 50, 102}`; however `202 * 100 / (50 + 50 + 100) == 101`. Floating point (exact) solution is `{50.5, 50.5, 101}` with the best (IMHO) integer approximation being `{51, 50, 101}` or `{50, 51, 101}` – Dmitry Bychenko Jul 15 '20 at 13:11
  • Well, OP said they don't care much about what "close as possible" means... :-) But your answer looks good! – Sweeper Jul 15 '20 at 13:14
  • 1
    I have to side with @DmitryBychenko , one couldotherwise argue that `reutrn new int[] { count, 0, 0, ... }` is a good solution, while I dont care what type of error is minimalized, at least somewhat good solution should be attempted. I have tried to run it a few times with random input and it *does* seem to accumulate error. Another problem is that last bucket could result in a negative value (but forgot to mention that in question, can really ask it later). – wondra Jul 15 '20 at 13:21
0

You can greedily round up using ceiling division on each iteration while checking it doesn't overflow count. If it does overflow, the part in that bucket is just the remainder.

This solution doesn't require any double or floating point arithmetic

public int[] SplitIntoBuckets(int count, int[] weights) {
    int sum = weights.Sum(d => d);
    int acc = 0;
    int[] result = new int[weights.Length];

    for(int i = 0; i < weights.Length; i++) {
        int part_round_up = (count * weight[i] + sum - 1) / sum;
    
        if(part_round_up + acc <= count) {
            result[i] = part_round_up;
        } else {
            result[i] = count - acc;
        }

        acc += result[i];
    }

    return result;
}