1

I'm trying to create a subclass of rv_continuous with a custom distribution for which I can calculate the pdf through a number of functions.

Here's what I've done so far

import numpy as np
from scipy.stats import rv_continuous

ancillary functions

def func1(xx, a_, b_, rho, m, sigma):
    return a_ + b_*(rho*(xx-m) + np.sqrt((xx-m)*(xx-m) + sigma*sigma))

def func2(xx, a_, b_, rho, m, sigma):
    sig2 = sigma*sigma
    return b_*(rho*np.sqrt((xx-m)*(xx-m)+sig2)+xx-m)/(np.sqrt((xx-m)*(xx-m)+sig2))

def func3(xx, a_, b_, rho, m, sigma):
    sig2 = sigma*sigma
    return b_*sig2/(np.sqrt((xx-m)*(xx-m)+sig2)*((xx-m)*(xx-m)+sig2))

def func4(xx, a_, b_, rho, m, sigma):
    w = func1(xx, a_, b_, rho, m, sigma)
    w1 = func2(xx, a_, b_, rho, m, sigma)
    w2 = func3(xx, a_, b_, rho, m, sigma)
    return (1.-0.5*xx*w1/w)*(1.0-0.5*xx*w1/w) - 0.25*w1*w1*(0.25 + 1./w) + 0.5*w2

def func5(xx, a_, b_, rho, m, sigma):
    vsqrt = np.sqrt(func1(xx, a_, b_, rho, m, sigma))
    return -xx/vsqrt - 0.5*vsqrt

density function eventually

def density(xx, a_, b_, rho, m, sigma):
    dm = func5(xx, a_, b_, rho, m, sigma)
    return func4(xx, a_, b_, rho, m, sigma)*np.exp(-0.5*dm*dm)/np.sqrt(2.*np.pi*func1(xx, a_, b_, rho, m, sigma))

a set of parameters

Params = 1.0073, 0.3401026, -0.8, 0.000830, 0.5109564

check the pdf from function

xmin, xmax, nbPoints = -10., 10., 2000
x_real = np.linspace(xmin, xmax, nbPoints)

den_from_func = density(x_real, *Params)

now construct my distribution class

class density_gen(rv_continuous):
    def _pdf(self, x, a_hat, b_hat, rho, m, sigma):
        return density(x, a_hat, b_hat, rho, m, sigma)

instantiation

my_density = density_gen(name='density_gen')

my_density.a, my_density.b, my_density.numargs

As I've specified _pdf I should have a working distribution instance

this works

pdf = my_density._pdf(x_real, *Params)

cdf works too albeit it's extremely slow

cdf = my_density._cdf(x_real, *Params)
my_density._cdf(0.1, *Params)

but for all the other methods I get nans, for instance

my_density.mean(*Params)    
my_density.ppf(0.01, *Params)

What I'm doing wrong here?

AleVis
  • 187
  • 1
  • 1
  • 10
  • Can you generate random numbers with `rvs`? If not, then this may be a limitation in SciPy, since I can generate random numbers just fine using your PDF in my own implementation of a PDF sampler. Is the PDF you give, with that set of parameters (`Params`) and bounds (`xmin`, `max`), the only PDF your application will use? – Peter O. Jun 22 '20 at 16:46
  • No I can not. Tried my_density.rvs(100000, *Params) and got ValueError: Domain error in arguments. With my_density._rvs(100000, *Params) gives AttributeError: 'density_gen' object has no attribute '_size'. I can generate samples from the pdf with my own sampler too and then get a best fit with Scipy's distributions but I'd really like to have a working rv_continuous instance. I generate many sets of Params daily (possibly intraday) and for many cases, the pdfs can vary widely. Bounds too can change albeit to a lesser extent. So speed is a concern here. Thanks! – AleVis Jun 23 '20 at 07:53

1 Answers1

1

It appears you need to add the _argcheck method to density_gen, since your distribution uses custom parameters:

class density_gen(rv_continuous):

    def _argcheck(self, *Params):
        return True

    def _pdf(self, x, a_hat, b_hat, rho, m, sigma):
        return density(x, a_hat, b_hat, rho, m, sigma)

my_density = density_gen(name='density_gen')
pdf = my_density._pdf(x_real, *Params)
print(my_density.rvs(size=5, *Params))
print(my_density.mean(*Params))  
print(my_density.ppf(0.01, *Params))

However, rvs, mean, and so on will then be very slow, presumably because the method needs to integrate the PDF every time it needs to generate a random number or calculate a statistic. If speed is at a premium, you will thus need to add to density_gen an _rvs method that uses its own sampler. An example of this is my own DensityInversionSampler, which generates random numbers by numerical inversion, when given only the PDF and the sampling domain.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • Understand, definitely would need to add more _methods if I wanted to use this in production. – AleVis Jun 24 '20 at 08:39