2

I would like to choose a range, for example, 60 to 80, and generate a random number from it. However, between 65-72 I'd like a higher probability, while the other ranges aside from this (60-64 and 73 to 80) to have lower.

An example:

From 60-64 there's 35% chance of being choosen as well for 73-80. From 65-72 65% chance.

The elements in the subranges are equally likely. I'm generating integers.

Also, it would be interesting a scalable solution, so that one could expand its usage for higher ranges, for example, 1000-2000, but biased toward 1400-1600.

Does anyone could help with some ideas?

Thanks beforehand for anyone willing to contribute!

Emerson Oliveira
  • 616
  • 1
  • 5
  • 15
  • Hi, what is a higher probability? Do you have some specifications for it or is normal distribution ok? If the latter, just sample from normal distribution using numpy or scipy. – My Work Jan 30 '21 at 15:55
  • Are you generating integers or floats? – Chris Johnson Jan 30 '21 at 15:55
  • 1
    does it solve your problem https://stackoverflow.com/questions/4265988/generate-random-numbers-with-a-given-numerical-distribution – Epsi95 Jan 30 '21 at 15:55
  • Hi @MyWork a higher probability would be like, between 65-72 maybe 65% chance of being choosen while the other ranges just 35%. I'll update that info in my question to clarify. – Emerson Oliveira Jan 30 '21 at 16:00
  • @ChrisJohnson i'm generating integers. But if you have any solution that uses floats and later rounds to integers, i think it would solve the problem as well. Anyways, gonna update it in the questions too. – Emerson Oliveira Jan 30 '21 at 16:01
  • @Epsi95 yes, it's kinda it what i'm looking for. The problem is i can't see it feasible for my issue, cause it demands that I manually set the probabilities in an array, and i'm dealing with a range of 20 numbers or more. That solution is good for a small range of numbers, as the answer itself is exemplifying with a 5 lenght probability array – Emerson Oliveira Jan 30 '21 at 16:04
  • 1
    Your description is pretty vague, which makes me think you don't know exactly what you want. One choice which is popular for simulation modeling when the specs are vague is the [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution). Use https://numpy.org/doc/stable/reference/random/generated/numpy.random.triangular.html to generate. – pjs Jan 30 '21 at 16:14
  • yes @Epsi95 all the elements `60-65` and `72-80` are equally likely – Emerson Oliveira Jan 30 '21 at 16:32
  • 1
    @EmersonOliveira -- Do you want a "podium"-shaped distribution, in which values in interval `[60,64]` and values in the interval `[73,80]` should have flat pdf=`0.175`, and the values in the interval `[65,72]` should have flat pdf=`0.65`? Or, do you want a Normal distribution with those additional constraints? – fountainhead Jan 30 '21 at 16:32
  • i've editted the question with clearer infos guys – Emerson Oliveira Jan 30 '21 at 16:42
  • yes @fountainhead it should be more like a "podium"-shaped distribuition. I took a minute to catch the reference to "podium" :D How could I do it with python? – Emerson Oliveira Jan 30 '21 at 16:45
  • 1
    This is a crude method thoug `a = numpy.array([0.65]*16 + [0.35]*5) a = a/sum(a) numpy.random.choice(numpy.arange(60, 81), p=a)` where `60-75` for 65% probaility and remaining 35% considering all in that renage equally likely @EmersonOliveira – Epsi95 Jan 30 '21 at 16:49
  • the distribution looks like https://www.imgurupload.com/uploads/20210130/2166c10e4b94c3b2995508734c54df7d4e807494.png for 10000 runs – Epsi95 Jan 30 '21 at 16:50

2 Answers2

3

For equally likely outcomes in the subranges, the following will do the trick:

import random

THRESHOLD = [0.65, 0.65 + 0.35 * 5 / 13]

def my_distribution():
    u = random.random()
    if u <= THRESHOLD[0]:
        return random.randint(65, 72)
    elif u <= THRESHOLD[1]:
        return random.randint(60, 64)
    else:
        return random.randint(73, 80)

This uses a uniform random number to decide which subrange you're in, then generates values equally likely within that subrange.

The THRESHOLD values are similar to a cumulative distribution function, but arranged so the most likely outcome is checked first. 65% of the time (u <= THRESHOLD[0]) you'll generate from the range [65, 72]. Failing that, 5 of the 13 remaining possibilities (5/13 of 35%) are in the range [60, 64], and the rest are in the range [73, 80]. A Uniform(0,1) value u will fall below the first threshold 65% of the time, and failing that, below the second threshold 5/13 of the time and above that threshold the remaining 8/13 of the time.

The results look like this:

Histogram of podium

pjs
  • 18,696
  • 4
  • 27
  • 56
  • nice! it's basically it i'm looking for, thanks! Could you please explain me why `0.65 + 0.35 * 5 / 13`? didn't understand that part – Emerson Oliveira Jan 30 '21 at 17:01
  • 1
    After handling the main 65% in the range of 65–72, you have **0.35** probability left. For these probability, there are two ranges: 60–64 (**5** numbers) and 73–80 (**8** numbers). The first range shares **0.35 * 5 / (8 + 5)** probabilities – Yang Yushi Jan 30 '21 at 17:47
1

Here's a numpy based solution:

import numpy as np

# Some params
left_start   = 60 # Start of left interval====== [60,64]
middle_start = 65 # Start of middle interval === [65,72]
right_start  = 73 # Start of right interval ===- [73,80]
right_end    = 80 # End of the right interval == [73,80]
count        = 1000 # Number of values to generate.
middle_wt    = 0.65 # Middle range to be selected with wt/prob=0.65

middle       = np.arange(middle_start, right_start)
rest         = np.r_[left_start:middle_start, right_start:(right_end+1)]
rng1 = np.random.default_rng(None) # Generator for randomly choosing range.
rng2 = np.random.default_rng(None) # Generator for generating values in the ranges.
# Now generate a random list of 0s and 1s to indicate choice between
# 'middle' and 'rest'. For this number generation we will set middle_wt as
# the weight/probability for 0 and (1-middle_wt) as the weight/probability for 1.
# (0 indicates middle range and 1 indicates the rest.)
range_choices   = rng1.choice([0,1], replace=True, size=count, p=[middle_wt, (1-middle_wt)])
# Now generate 'count' values for the middle range
middle_choices  = rng2.choice(middle, replace=True, size=count)
# Now generate 'count' values for the 'rest' of the range (non-middle)
rest_choices    = rng2.choice(rest, replace=True, size=count)

result          = np.choose(range_choices, (middle_choices,rest_choices))
print (np.sum((65 <= result) & (result<=72)))

Note: In the above code, p=[middle_wt, (1-middle_wt)] is a list of weights. The middle_wt is the weight for the middle range [65,72], and the (1-middle_wt) is the weight for the rest.

Output:

649 # Indicates that 649 out of the 1000 values of result are in the middle range [65,72]

fountainhead
  • 3,584
  • 1
  • 8
  • 17
  • Thanks for sharing that solution! The count var wouldn't even be necessary as I'm looking for only one output so size could be fixed to 1, but it's nice if eventually I need to apply same calculation for more outputs. Thanks! – Emerson Oliveira Jan 30 '21 at 18:00
  • If possible, could you comment the lines `rng*` variables are involved? I think i didn't catch it totally – Emerson Oliveira Jan 30 '21 at 18:07