Random numbers with user-defined continuous probability distribution

Question

I would like to simulate something on the subject of photon-photon-interaction. In particular, there is Halpern scattering. Here is the German Wikipedia entry on it Halpern-Streuung. And there the differential cross section has an angular dependence of (3+(cos(theta))^2)^2.

I would like to have a generator of random numbers between 0 and 2*Pi, which corresponds to the density function ((3+(cos(theta))^2)^2)*(1/(99*Pi/4)). So the values around 0, Pi and 2*Pi should occur a little more often than the values around Pi/2 and 3.

I have already found that there is a function on how to randomly output discrete values with user-defined probability values numpy.random.choice(numpy.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2, 0.4, 0.2]). I could work with that in an emergency, should there be nothing else. But actually I already want a continuous probability distribution here.

I know that even if there is such a Python command where you can enter a mathematical distribution function, it basically only produces discrete distributions of values, since no irrational numbers with 1s and 0s can be represented. But still, such a command would be more elegant with a continuous function.

If you have the inverse of the CDF, you can extract a random number in `[0,1)` and use that to get a sample in the `[0, 2pi)` range. More info [here](https://en.wikipedia.org/wiki/Inverse_transform_sampling) — Pietro, Mar 30 '21 at 16:32
You could create a custom distribution by extending `scipy.stats.rv_continuous` and then use that to get random variates: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html — Pranav Hosangadi, Mar 30 '21 at 16:34
Does this answer your question? [How to generate random numbers with predefined probability distribution?](https://stackoverflow.com/questions/51050658/how-to-generate-random-numbers-with-predefined-probability-distribution) — Peter O., Mar 30 '21 at 17:26
Thanks to all, I think I will have a look at the thing with the inverse transform sampling. The function ' scipy.stats.rv_continuous' is only for certain probability distributions. Not arbitrary, right? And the Markov chain Montecarlo sampling might be something too. Thanks to you. — Severin, Mar 31 '21 at 09:56

Peter O. · Accepted Answer · 2023-06-17T06:40:01.347

1

Assuming the density function you have is proportional to a probability density function (PDF) you can use the rejection sampling method: Draw a number in a box until the box falls within the density function. It works for any bounded density function, call it pdf, With a closed and bounded domain, as long as you know what the domain and bound are (the bound is the maximum value of pdf in the domain). In the rejection sampling method, pdf need not integrate to 1. In this case, pdf is (((3+(math.cos(x))**2)**2)*(1/(99*math.pi/4)))and the bound is 64/(99*math.pi), and the algorithm works as follows:

import math
import random

def sample():
    mn=0 # Lowest value of domain
    mx=2*math.pi # Highest value of domain
    bound=64/(99*math.pi) # Upper bound of PDF value
    while True: # Do the following until a value is returned
       # Choose an X inside the desired sampling domain.
       x=random.uniform(mn,mx)
       # Choose a Y between 0 and the maximum PDF value.
       y=random.uniform(0,bound)
       # Calculate PDF
       pdf=(((3+(math.cos(x))**2)**2)*(1/(99*math.pi/4)))
       # Does (x,y) fall in the PDF?
       if y<pdf:
           # Yes, so return x
           return x
       # No, so loop

See also the section "Sampling from an Arbitrary Distribution" in my article on randomization.

The following shows the method's correctness by showing the probability that the returned sample is less than π/8. For correctness, the probability should be close to 0.0788:

print(sum(1 if sample()<math.pi/8 else 0 for _ in range(1000000))/1000000)

edited Jun 17 '23 at 06:40

answered Mar 30 '21 at 17:30

Peter O.

32,158
14
82
96

If I understand it correctly. You create a random value with a uniform distribution and then delete this value with a certain probability according to the distribution. – Severin Mar 31 '21 at 09:45
I just have some doubts whether you arrive at the correct distribution function this way. For example, the given probability that you get a value between 0 and Pi/8 (just an example) with this distribution should be 0.0788... (integral from 0 to Pi/8 over P(x) (P(x) is the distribution function)). – Severin Mar 31 '21 at 09:45
The x-equal distribution gives a value between 0 and Pi/8 with 0.0625 probability. If the x-value lies in this range, the generated y-value will be 0.00199 greater than the calculated function value at x (integral from 0 to Pi/8 over Pmax-P(x)) (Pmax is here 64/(99*pi)). That means, your variant will output a value between 0 and Pi/8 with a probability of 0.0625*(1-0.00199)=0.00624 instead of 0.0788. Or have I made a mistake in my thinking? – Severin Mar 31 '21 at 09:46
Even if this is correct, thank you for your help. The idea is not bad in principle. – Severin Mar 31 '21 at 09:46
No, that's not how it works. Think of a box that covers the density function (PDF). This box is `2*pi` units wide and `64/(99*pi)` units high. The PDF is fully contained in the box, and covers 99/128 of the area of the box. Throw a dart and assume it lands in the box. Then `x` and `y` are the dart's coordinates. Now we check if the dart also lies in the PDF. To do this, we calculate the "height" of the PDF at `x`, call the height `pdf`, then do a check `if y < pdf`. ... – Peter O. Mar 31 '21 at 10:03
[...] This particular check succeeds at a 99/128 chance, since the PDF's area covers 99/128 of the box. If the dart lands on the PDF, we return x; otherwise we start over. As a result, we return a number close to x with probability proportional to the PDF at x. – Peter O. Mar 31 '21 at 10:51
Yes, it occurred to me fairly soon afterwards that my calculation was wrong. Values are deleted. So from that point of view, mine is not right at all. Excuse me. The idea is not so bad. Either that or the inverse transform sampling method. I'm still wavering, after 3 beers in the morning - fun! I'm not wavering after three beers. – Severin Apr 01 '21 at 09:19
That's the thing about being spoilt for choice when you are presented with several options at once. But thank you for that. – Severin Apr 01 '21 at 09:23

score 0 · Answer 2 · edited Apr 19 '21 at 16:40

I had two suggestions in mind. The inverse transform sampling method and the "Deletion metode" (I'll just call it that). The inverse transform sampling method: There is an inverse function to my distribution. But I get problems in several places with the math. functions because of the domain. E.g. math.sqrt(-1). You would still have to trick around with if-queries here.That's why I decided to use Peter's suggestion.

And if you collect values in a loop and plot them in a histogram, it also looks quite good. Here with 40000 values and 100 bins

enter image description here

Here is the whole code for someone who is interested

import numpy as np
import math
import random
import matplotlib.pyplot as plt

N=40000
bins=100

def Deletion_method():
    x=None
    while x==None:
        mn=0 # Lowest value of domain
        mx=2*math.pi # Highest value of domain
        bound=64/(99*math.pi) # Upper bound of PDF value

        # Choose an X inside the desired sampling domain.
        xrad=random.uniform(mn,mx)
        # Choose a Y between 0 and the maximum PDF value.
        y=random.uniform(0,bound)

        # Calculate PDF
        P=((3+(math.cos(xrad))**2)**2)*(1/(99*math.pi/4))
 
        # Does (x,y) fall in the PDF?
        if y<P:
           x=xrad
    return(x)


Values=[]

for k in range(0, N):
    Values=np.append(Values, [Deletion_method()])
   
plt.hist(Values, bins)
plt.show()

Random numbers with user-defined continuous probability distribution

2 Answers2

Linked