Generate random integers with two constraints (sum and local maximum) in Python

Question

I have a dataframe where I want to create random numbers in a new column. The random numbers must fulfill two constraints:

The random numbers must add up to a specified sum (in the example, the sum is 300)
For each observation, the random numbers must not exceed a value in the constraint column.

In the example below, the constraints are fulfilled because the sum is 300 and the random number does not exceed the constraint column.

Example:

GEOID	CONSTRAINT	RANDOM
010010000001	100	80
010010000002	50	40
010010000003	75	60
010010000004	75	60
010010000005	100	60

It seems having random numbers totaling a sum has been demonstrated but I do not see an example with a second constraint.

Edit for clarity: The new column must be integers. The minimum lower bound value is 0.

What happens if the numbers cannot add up to sum due to constraints? — Dani Mesejo, Sep 21 '21 at 15:37
@MichaelSzczesny are you suggesting floats instead? I suppose that would work if the floats could be transformed into integers with and not violate either of the constraints. — adin, Sep 21 '21 at 16:03
I expressed myself poorly. Let's try with an example: if all constraint equals 10 and the sum is 300 then there is no solution for your problem — Dani Mesejo, Sep 21 '21 at 16:07
@adin So you're confirming to Michael that the outcomes have to be integers? It's a yes/no question. — pjs, Sep 21 '21 at 16:21
@MichaelSzczesny I think so too, but by not actually saying "yes" the response remained ambiguous so I'd like to pin it down. There's also been no discussion of whether there's a lower bound. Can values go negative? — pjs, Sep 21 '21 at 16:25
I think something like: `res = rng.multinomial(300, df["CONSTRAINT"] / df["CONSTRAINT"].sum(), size=1)` may work well for integers — Dani Mesejo, Sep 21 '21 at 16:28
Does this answer your question? [Is there an efficient way to generate N random integers in a range that have a given sum or average?](https://stackoverflow.com/questions/61393463/is-there-an-efficient-way-to-generate-n-random-integers-in-a-range-that-have-a-g) — Peter O., Sep 21 '21 at 16:34
In particular, the first edit to this answer: https://stackoverflow.com/a/61525097/815724 — Peter O., Sep 21 '21 at 16:35
@adin According to your current wording, solutions [100, -50, 75, 75, 100], [100, 50, -25, 75, 100], or [100, 50, 75, -25, 100] would all meet your requirements. Is that correct, or are there lower bounds as well? — pjs, Sep 21 '21 at 16:36
@DaniMesejo - That's brilliant, I only get 2% violations of the second constraint. That's very well suited for rejection sampling with 98% success on first try. — Michael Szczesny, Sep 21 '21 at 16:39
@pjs thank you for bringing that up, I edited the original question for clarity that the lower bound is zero. — adin, Sep 21 '21 at 16:49
@DaniMesejo can you please put `res = rng.multinomial(300, df["CONSTRAINT"] / df["CONSTRAINT"].sum(), size=1)` in a formal answer and explain how it works? I tried 100 groups of 1000 attempts using that answer and found that on average, I rejected 23% of the results because one of the results was greater than a constraint. Since I can check for valid attempts, and each iteration was quick, this solution works for me. — adin, Sep 21 '21 at 19:17

Dani Mesejo · Accepted Answer · 2021-09-21T22:10:30.897

You could use the multinomial distribution to build an approximate answer:

def sample(total, constraints):
    import numpy as np
    rng = np.random.default_rng()
    samples = rng.multinomial(total, constraints / constraints.sum(), size=100)
    return next(val for val in samples if np.all(val < constraints))


df["RANDOM"] = sample(300, df["CONSTRAINT"].values)
print(df)

Output

             GEOID  CONSTRAINT  RANDOM
0  10010000001         100      81
1  10010000002          50      42
2  10010000003          75      57
3  10010000004          75      53
4  10010000005         100      67

Thanks goes to @Michael Szczesny for testing the solution.

The key to solve this, relies in (quote from numpy docs):

Its values, X_i = [X_0, X_1, ..., X_p], represent the number of times the outcome was i.

see more details in this blog post.

Generate random integers with two constraints (sum and local maximum) in Python

1 Answers1