2

I would like to choose a random integer between a and b (both included), with the statistical weight of c.

c is a value between a and b.

Which is the most efficient way to apply the weight factor c to random.randint?

The closest I got was this question, but there is a big difference:

I have only one single statistical weight c, not a statistical probability for every value between a and b.

Example:

a = 890
b = 3200

c = 2600

print(random.randint(a,b))

>>>> supposed to result most frequently in a value around 2600

I don't really care about the distribution between a and b, as long as there is a weight on c. However, a Gaussian distribution would be appreciated.

Please note: this question does not not address the numpy.random module as in this question.

maxischl
  • 579
  • 1
  • 11
  • 29
  • 3
    The problem is that *supposed to result most frequently in a value around 2600* is not specific enough: we can tweak an uniform distribution or a gaussian one, and for the latter, we can use a large or small standard deviation. What do you want exactly? – Serge Ballesta Mar 26 '20 at 15:01
  • Sounds like they want `c` as the mode. – Masklinn Mar 26 '20 at 15:06
  • @SergeBallesta: I made an edit, mentioned the distribution type in a new paragraph. – maxischl Mar 26 '20 at 15:08
  • 1
    Does this answer your question? [How to generate a random normal distribution of integers](https://stackoverflow.com/questions/37411633/how-to-generate-a-random-normal-distribution-of-integers) – dumbPy Mar 26 '20 at 15:19

2 Answers2

3

Sounds like the triangular distribution might fit your needs. The values a and b are the min and max, respectively, and c corresponds to the mode (most likely outcome) of the distribution.

There is a triangular generator in numpy.random. It generates floats, but you can round and then integerize the results. If you're being picky, this would be slightly biased away from the min and max, which would only have half the range and thus half the expected count compared to the other integer values. Statisticians adjust for this range conversion from reals to ints using a continuity correction: subtract 1/2 from the min, add 1/2 to the max. This is most likely to be relevant if you're dealing with small ranges, as in the tiny example below.

import numpy as np
import matplotlib.pyplot as plt

# replace with your actual values
a = 1
b = 5
c = 2

# Without continuity correction
plt.hist(np.ma.round(np.random.triangular(
          left = a,
          mode = c,
          right = b, 
          size = 100000)
        ).astype(int),
        range = (0.5, 5.5), bins = 50, density = True)
plt.show()

# With continuity correction
plt.hist(np.ma.round(np.random.triangular(
          left = a - 0.5,
          mode = c,
          right = b + 0.5, 
          size = 100000)
        ).astype(int),
        range = (0.5, 5.5), bins = 50, density = True)
plt.show()

Here are the results with your actual parameterization:

# Actual target case
a = 890
b = 3200
c = 2600
plt.hist(np.ma.round(np.random.triangular(
          left = a - 0.5,
          mode = c,
          right = b + 0.5, 
          size = 100000)
        ).astype(int),
        range = (500, 3500), bins = 300, density = True)
plt.show()

Histogram of generated triangular distribution

Note that unlike the normal distribution suggested in comments, this is guaranteed to stay within the range (a, b).

pjs
  • 18,696
  • 4
  • 27
  • 56
2

You use

random.choices(range(a,b+1), weights= [....], k=1)  # or cum_weights

for a k of 1 and a population in range(a,b+1) and the weights you want.

See: https://docs.python.org/3/library/random.html#random.choices


You would have to calculate a possible (arbritrary) weighting, f.e.:

import random
from collections import defaultdict
a = 8
b = 32

c = 26

# hacked distribution
w = [(i-a)**2 if i <= c else (b-i+a)**2 for i in range(a,b+1)]

d=defaultdict(int)
for i in range(a,b+1):
    d[i]=0

# test for 10k numbers
for num in random.choices(range(a,b+1), weights = w, k=10000):
    d[num] += 1

print(w)
print(d)

It is still random, one run got me:

# hacked distribution
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 
  256, 289, 196, 169, 144, 121, 100, 81, 64]

# test for 10k numbers
{8: 0, 9: 8, 10: 7, 11: 37, 12: 61, 13: 94, 14: 149, 15: 175, 16: 229, 
 17: 283, 18: 374, 19: 450, 20: 493, 21: 628, 22: 672, 23: 820, 24: 907, 
 25: 1038, 26: 1183, 27: 564, 28: 537, 29: 435, 30: 325, 31: 293, 32: 238}
Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • 2
    They don't have a bunch of weights though, just a mode. So selecting and scaling a statistical distribution with the expected properties is probably the solution. – Masklinn Mar 26 '20 at 15:04
  • As Masklinn already mentioned, I do not have weights for every possible state in my range - hence, I cannot set weights for random.choices. Or am I wrong? – maxischl Mar 26 '20 at 15:09
  • @mask He would have to calculate a distribution the, see edit. – Patrick Artner Mar 26 '20 at 15:23