How can generate n numbers in the interval [a,b] that their summation doesn't exceed k in Python?

Question

Let me I from the question in an easy way:

I usually have about k=3000€ per month. This month had n=26 working days (in July as you see in the following picture), and generally, I work something between [100,120]€ each day.

Note: k could be +/- x€ if needed, but it should be as minimum as possible.

what I tried to generate n numbers within [a,b] interval, but it should be very close to the k:

import numpy as np
#rng = np.random.default_rng(123)
#arr1 = rng.uniform(100, 120,26)
arr1 = np.random.randint(100,120,26)

#array([107, 115, 116, 105, 104, 110, 110, 107, 116, 110, 101, 112, 109,
#       111, 118, 102, 108, 113, 101, 112, 111, 116, 111, 109, 110, 107])

total = np.sum(arr1)
print(f'Sum of all the elements is {total}')
#Sum of all the elements is 2851

I don't have any clue to fulfil the condition. The summation of generated random numbers should be close to k [k, k+i] i=minimum e.g. [3000€, 3050€].

Edit1: I tried to compare the distribution quality of generated values offered by plotting/fitting offered solutions from @Murali & @btilly in the form of PDF as below:

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

h = arr1
h.sort()
hmean = np.mean(h)
hstd = np.std(h)
pdf = stats.norm.pdf(h, hmean, hstd)
#plt.hist(arr1)
plt.plot(h, pdf,'-o',alpha=0.4) # including h here is crucial

So clearly one has a skew, but the other is the normal distribution.

How close to 3000 should be the sum? what is the minimum and maximum acceptable? — Colim, Jul 26 '22 at 22:19
I think there are too few details. Why do you need randoms? How random should they be? Statistically for the values you entered, the average sum would be 2860. But what is the main goal here? Cause it kind of smells like an *XY Problem*. — CristiFati, Jul 26 '22 at 22:21

score 1 · Accepted Answer · 2022-07-26T22:53:11.637

1

You can use a Gaussian distribution for a given mean and standard deviation

mu = 3000/26

sigma = 5 ## allowed deviation from mean +- 5 from mean i.e [110.4,120.4]

arr1 = np.random.normal(mu, sigma, 26)

print(np.sum(arr1))
# 3011.268333226019

You can also play with other distributions and see which fits your purpose.

edited Jul 26 '22 at 22:53

answered Jul 26 '22 at 22:33

I very much liked your solution using Gaussian distribution. I tried using `print(np.ceil(arr1))` for different `sigma` . here probably you run the code many times in absence of `np.random.seed(None)`. Any idea to find the optimum value for `sigma` so that summation would be as close as possible to k)? – Mario Jul 26 '22 at 23:23
I hacked a recent issue by slightly changing the k instead of 3000 to 2950 and solve this problem. I also plot the Gaussian distribution to explore more. feel free if you can extend your solution to funding optimum parameters. – Mario Jul 26 '22 at 23:43
Since, you don't have any bias and every day is considered independent, you distribution must look Gaussian for sufficiently large n (Central limit theorem). The probabity of finding a value in [-simga+ mu, mu + sigma] is 68%, similarly in [-2*simga+ mu, mu + 2*sigma}] is nearly 95% percent. so you choose accordingly. For example, in your question, you told that you most need it in range [100,120], so if you choose sigma = 5, then 68% of the time, you values will be in [110,120] (assuming mean is 115). if you sigma = 2.5, then 95% of time, you values will be in [110,120]. – Jul 27 '22 at 04:48
Also, the number of events is very less, so I would say, you still have an error if every day is considered independent trail. If n is sufficiently large, you can converge to mean. you can check by running for 10 times. – Jul 27 '22 at 04:52

score 0 · Answer 2 · answered Jul 26 '22 at 22:21

Disclaimer:

As far as I know, there isn't any built in way to do this. Below is a possible solution.

Possible Solution:

You could run it, then find the difference between 3,000 and the number that the program printed. You could split that by the number of days there are and add that number to all the days.

Example:

The sum is 2896, so subtract that from 3,000. This is 104, and divide that by the number of days (26) to get 4. Add 4 to all the numbers.

Important Notes:

You would have to double check that the numbers aren't above the maximum allowed. Also, if the number printed happened to be more than 3,000, you would have to do this except with subtraction. If you're doing this, you have to double check that the numbers aren't above the minimum allowed.

btilly · Answer 3 · 2022-07-27T21:20:28.557

This is yet another problem where the solution in Generating a random string with matched brackets applies.

import random

class DPPath:
    def __init__ (self):
        self.count = 0
        self.next = None

    def add_option(self, transition, tail):
        if self.next is None:
            self.next = {}
        self.next[transition] = tail
        self.count += tail.count

    def random (self):
        if 0 == self.count:
            return None
        else:
            # Sadly random.randrange produce invalid results for ranges too large.
            max_rand = 2**32
            def rand (n):
                if n <= max_rand:
                    return random.randint(0, n)
                else:
                    m = random.randint(0, max_rand)
                    step = n // max_rand
                    missing = n - (max_rand * step)
                    n_lower = m * step + (m * missing) // max_rand
                    n_upper = (m+1) * step + ((m+1) * missing) // max_rand
                    return n_lower + rand(n_upper - n_lower)

            return self.find(rand(self.count - 1))

    def find (self, pos):
        result = self._find(pos)
        result.pop() # Remove the "total sum transition"
        return result

    def _find (self, pos):
        if self.next is None:
            return []

        for transition, tail in self.next.items():
            if pos < tail.count:
                result = tail._find(pos)
                result.append(transition)
                return result
            else:
                pos -= tail.count

        raise IndexError(f"find item {pos + self.count} out of range for {self.count}")

def sum_options (days, min_n, max_n, min_total, max_total):
    # Record that there is one empty sum.
    base_dp = DPPath()
    base_dp.count = 1

    dps = {0: base_dp}
    for day in range(days):
        prev_dps = {}
        for s, dp in dps.items():
            for i in range(min_n, max_n+1):
                if s + i not in prev_dps:
                    prev_dps[s+i] = DPPath()
                prev_dps[s+i].add_option(i, dp)
        dps = prev_dps

    # And now we want a dp answer to all in range.
    final_dp = DPPath()
    for s in range(min_total, max_total+1):
        if s in dps:
            final_dp.add_option(s, dps[s])
    return final_dp

print(sum_options(26, 100, 120, 3000, 3050).random())

Thanks for your input. @Peter O. offered the Java-based [post](https://stackoverflow.com/questions/61393463/is-there-an-efficient-way-to-generate-n-random-integers-in-a-range-that-have-a-g) which title could fit/form the current problem and I *submitted*, but later I realized it's slightly different based on this [answer](https://stackoverflow.com/a/61580353/10452700) when you see the mapping, and the question was in *python* clearly in the title. Now it looks like a duplication. Maybe you can fix it or vote to *reopen* it. there could be a better further approach too. — Mario, Jul 27 '22 at 07:15

How can generate n numbers in the interval [a,b] that their summation doesn't exceed k in Python?

3 Answers3