scipy rv_continuous very slow

Question

I am using a custom function f(x) to define a custom distribution using copy's rv_continuous class. My code is

class my_pdf_gen(rv_continuous):
    def _pdf(self, x, integral):
        return f(x)/integral

where integral ensure the normalisation. I am able to create an instance of it with

my_pdf = my_pdf_gen(my_int,a = a, b = b, name = 'my pdf')

with a,b the upper and lower limit of the value's range, and my_int= scipy.integrate.quad(f, a, b)[0]. I am also able to create a random sample of data using my_pdf.rvs(my_int, size = 5), but this is very slow. (Up to 6 seconds when size=9).

I read that one should also overwrite some other methods in the class (like _ppf), but from the examples I found it isn't clear to me how to achieve it in my case.

Thanks a lot!

@SeverinPappadeux since it may was a bit tricky to evaluate I stored it in a value and call it each time — Lord Nexprex, Jun 21 '20 at 17:11
ok, so next to reimplement would be _ppf, which is actually inverse CDF, sampling is basically `return ppf(randomU01);`. I would also advice to look at what you have and see that everything is vectorized (it could take a numpy vector of values and return vector of results). Otherwise you would see multiple calls of your functions — Severin Pappadeux, Jun 21 '20 at 19:04

score 1 · Answer 1 · answered Jun 22 '20 at 06:04

It's expected to be slow since the generic implementation does root-solving for cdf, which itself uses numerical integration.

So your best bet is to provide a _ppf or _rvs implementation. How to do this greatly depends on the details of f(x). If you cannot solve f(x) = r analytically, consider tabulating / inverse interpolation or rejection sampling.

score 0 · Accepted Answer · answered Jun 29 '20 at 09:55

I solved the problem by changing approach and using Monte Carlo's rejection sampler method

def rejection_sampler(p,xbounds,pmax):
    while True:
        x = np.random.rand(1)*(xbounds[1]-xbounds[0])+xbounds[0]
        y = np.random.rand(1)*pmax
        if y<=p(x):
            return x

where p is the probability density function, xbounds is a tuple containing the upper and lower limits of of the pdf and pmax is the maximum value of the pdf on the domain.

Monte Carlo's rejection sampler was suggested here: python: random sampling from self-defined probability function

scipy rv_continuous very slow

2 Answers2

Linked