I propose the following query (PostgreSQL):
WITH ZeroToOne (m, n, y) AS (
SELECT m, n, random()
FROM inputs
CROSS JOIN generate_series(1, n)
), SumToM (m, n, y, x) AS (
SELECT m, n, y, y * m / sum(y) OVER (PARTITION BY m, n)
FROM zerotoone
), MissingToM (m, n, l) AS (
SELECT m, n, m - sum(floor(x))
FROM sumtom
GROUP BY m, n
)
SELECT m, n, y, x, l,
CASE
WHEN row_number() OVER (PARTITION BY m, n ORDER BY x - floor(x) DESC) > l
THEN floor(x)
ELSE ceil(x)
END AS v
FROM missingtom
NATURAL JOIN sumtom;
The only interesting values are m, n and v; I left the other values for explanation purposes.
I will step through the query with the following input cases as running examples:
SELECT * FROM inputs;
m | n
----+---
20 | 4
30 | 4
42 | 3
(3 rows)
The first CTE (ZeroToOne) computes n
random values in the range [0, 1] for each input case and call these values y
:
m | n | y
----+---+---------------------
20 | 4 | 0.374425032641739
20 | 4 | 0.644279096741229
20 | 4 | 0.626386553514749
20 | 4 | 0.320786282420158
30 | 4 | 0.848764919675887
30 | 4 | 0.268079651053995
30 | 4 | 0.250213726423681
30 | 4 | 0.497460773680359
42 | 3 | 0.571454062592238
42 | 3 | 0.00338772451505065
42 | 3 | 0.139226260595024
The second CTE (SumToM) multiplies each y
value by m
and divides the result by the sum of values for the input case. As a result, summing up all of the x
for an input couple (m, n) gives m
:
m | n | y | x
----+---+---------------------+-------------------
20 | 4 | 0.374425032641739 | 3.80924177094873
20 | 4 | 0.644279096741229 | 6.55462277759638
20 | 4 | 0.626386553514749 | 6.37259161753762
20 | 4 | 0.320786282420158 | 3.26354383391728
30 | 4 | 0.848764919675887 | 13.6565766414436
30 | 4 | 0.268079651053995 | 4.3133855037604
30 | 4 | 0.250213726423681 | 4.02592384820881
30 | 4 | 0.497460773680359 | 8.00411400658722
42 | 3 | 0.571454062592238 | 33.6117414945302
42 | 3 | 0.00338772451505065 | 0.199258922297338
42 | 3 | 0.139226260595024 | 8.18899958317244
It is obvious that m is greater than the sum of the integer parts of the x
values. It is also easy to see that the difference between the two sums (sum of the x values and sum of the integer parts of the x values) is less than n
. So the idea now is to count hom many numbers have to be rounded up and how many have to be rounded down. The l value of the third CTE (MissingToM) is the number of values to be rounded up:
m | n | l
----+---+---
20 | 4 | 2
30 | 4 | 1
42 | 3 | 1
To ensure that the distribution of numbers remains uniform, we round up the numbers which have the highest fractional part with the final query:
m | n | y | x | l | v
----+---+---------------------+-------------------+---+----
20 | 4 | 0.374425032641739 | 3.80924177094873 | 2 | 4
20 | 4 | 0.644279096741229 | 6.55462277759638 | 2 | 7
20 | 4 | 0.626386553514749 | 6.37259161753762 | 2 | 6
20 | 4 | 0.320786282420158 | 3.26354383391728 | 2 | 3
30 | 4 | 0.848764919675887 | 13.6565766414436 | 1 | 14
30 | 4 | 0.268079651053995 | 4.3133855037604 | 1 | 4
30 | 4 | 0.250213726423681 | 4.02592384820881 | 1 | 4
30 | 4 | 0.497460773680359 | 8.00411400658722 | 1 | 8
42 | 3 | 0.571454062592238 | 33.6117414945302 | 1 | 34
42 | 3 | 0.00338772451505065 | 0.199258922297338 | 1 | 0
42 | 3 | 0.139226260595024 | 8.18899958317244 | 1 | 8
As the query will fail if the same configuration (m, n) occurs several times in the inputs table, I add a primary key constraint on it:
ALTER TABLE inputs ADD PRIMARY KEY (m, n);