I have been working with the following link, Fitting empirical distribution to theoretical ones with Scipy (Python)?
I have been using my data to the code from the link and found out that the common distribution for my data is the Non-Central Student’s T distribution. I couldn’t find the distribution in the pymc3 package, so, I decided to have a look with scipy to understand how the distribution is formed. I created a custom distribution and I have few questions:
- I would like to know if my approach to creating the distribution is right?
- How can I implement the custom distribution into models?
- Regarding the prior distribution, do I use same steps in normal distribution priors (mu and sigma) combined with halfnormed for degree of freedom and noncentral value?
My custom distribution:
import numpy as np
import theano.tensor as tt
from scipy import stats
from scipy.special import hyp1f1, nctdtr
import warnings
from pymc3.theanof import floatX
from pymc3.distributions.dist_math import bound, gammaln
from pymc3.distributions.continuous import assert_negative_support, get_tau_sigma
from pymc3.distributions.distribution import Continuous, draw_values, generate_samples
class NonCentralStudentT(Continuous):
"""
Parameters
----------
nu: float
Degrees of freedom, also known as normality parameter (nu > 0).
mu: float
Location parameter.
sigma: float
Scale parameter (sigma > 0). Converges to the standard deviation as nu increases. (only required if lam is not specified)
lam: float
Scale parameter (lam > 0). Converges to the precision as nu increases. (only required if sigma is not specified)
"""
def __init__(self, nu, nc, mu=0, lam=None, sigma=None, sd=None, *args, **kwargs):
super().__init__(*args, **kwargs)
super(NonCentralStudentT, self).__init__(*args, **kwargs)
if sd is not None:
sigma = sd
warnings.warn("sd is deprecated, use sigma instead", DeprecationWarning)
self.nu = nu = tt.as_tensor_variable(floatX(nu))
self.nc = nc = tt.as_tensor_variable(floatX(nc))
lam, sigma = get_tau_sigma(tau=lam, sigma=sigma)
self.lam = lam = tt.as_tensor_variable(lam)
self.sigma = self.sd = sigma = tt.as_tensor_variable(sigma)
self.mean = self.median = self.mode = self.mu = mu = tt.as_tensor_variable(mu)
self.variance = tt.switch((nu > 2) * 1, (1 / self.lam) * (nu / (nu - 2)), np.inf)
assert_negative_support(lam, 'lam (sigma)', 'NonCentralStudentT')
assert_negative_support(nu, 'nu', 'NonCentralStudentT')
assert_negative_support(nc, 'nc', 'NonCentralStudentT')
def random(self, point=None, size=None):
"""
Draw random values from Non-Central Student's T distribution.
Parameters
----------
point: dict, optional
Dict of variable values on which random values are to be
conditioned (uses default point if not specified).
size: int, optional
Desired size of random sample (returns one sample if not
specified).
Returns
-------
array
"""
nu, nc, mu, lam = draw_values([self.nu, self.nc, self.mu, self.lam], point=point, size=size)
return generate_samples(stats.nct.rvs, nu, nc, loc=mu, scale=lam ** -0.5, dist_shape=self.shape, size=size)
def logp(self, value):
"""
Calculate log-probability of Non-Central Student's T distribution at specified value.
Parameters
----------
value: numeric
Value(s) for which log-probability is calculated. If the log probabilities for multiple
values are desired the values must be provided in a numpy array or theano tensor
Returns
-------
TensorVariable
"""
nu = self.nu
nc = self.nc
mu = self.mu
lam = self.lam
n = nu * 1.0
nc = nc * 1.0
x2 = value * value
ncx2 = nc * nc * x2
fac1 = n + x2
trm1 = n / 2. * tt.log(n) + gammaln(n + 1)
trm1 -= n * tt.log(2) + nc * nc / 2. + (n / 2.) * tt.log(fac1) + gammaln(n / 2.)
Px = tt.exp(trm1)
valF = ncx2 / (2 * fac1)
trm1 = tt.sqrt(2) * nc * value * hyp1f1(n / 2 + 1, 1.5, valF)
trm1 /= np.asarray(fac1 * tt.gamma((n + 1) / 2))
trm2 = hyp1f1((n + 1) / 2, 0.5, valF)
trm2 /= np.asarray(np.sqrt(fac1) * tt.gamma(n / 2 + 1))
Px *= trm1 + trm2
return bound(Px, lam > 0, nu > 0, nc > 0)
def logcdf(self, value):
"""
Compute the log of the cumulative distribution function for Non-Central Student's T distribution
at the specified value.
Parameters
----------
value: numeric
Value(s) for which log CDF is calculated. If the log CDF for multiple
values are desired the values must be provided in a numpy array or theano tensor.
Returns
-------
TensorVariable
"""
nu = self.nu
nc = self.nc
return nctdtr(nu, nc, value)
My Custom model:
with pm.Model() as model:
# Prior Distributions for unknown model parameters:
mu = pm.Normal('sigma', 0, 10)
sigma = pm.Normal('sigma', 0, 10)
nc= pm.HalfNormal('nc', sigma=10)
nu= pm.HalfNormal('nu', sigma=1)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
=> (input custom distribution) observed_data = pm.Beta('observed_data', alpha=alpha, beta=beta, observed=data)
# draw 5000 posterior samples
trace = pm.sample(draws=5000, tune=2000, chains=3, cores=1)
# Obtaining Posterior Predictive Sampling:
post_pred = pm.sample_posterior_predictive(trace, samples=3000)
print(post_pred['observed_data'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=trace))
print(pm.stats.summary(data=post_pred))
Edit 1:
I redesigned the custom model to include the custom distribution, however, I keep on getting error based on the equations used to get the likelihood distribution or sometimes tensor locks down and the code just freeze. Find my code below,
with pm.Model() as model:
# Prior Distributions for unknown model parameters:
mu = pm.Normal('mu', mu=0, sigma=1)
sd = pm.HalfNormal('sd', sigma=1)
nc = pm.HalfNormal('nc', sigma=10)
nu = pm.HalfNormal('nu', sigma=1)
# Custom distribution:
# observed_data = pm.DensityDist('observed_data', NonCentralStudentT, observed=data_list)
# Observed data is from a Likelihood distributions (Likelihood (sampling distribution) of observations):
observed_data = NonCentralStudentT('observed_data', mu=mu, sd=sd, nc=nc, nu=nu, observed=data_list)
# draw 5000 posterior samples
trace_S = pm.sample(draws=5000, tune=2000, chains=3, cores=1)
# Obtaining Posterior Predictive Sampling:
post_pred_S = pm.sample_posterior_predictive(trace_S, samples=3000)
print(post_pred_S['observed_data'].shape)
print('\nSummary: ')
print(pm.stats.summary(data=trace_S))
print(pm.stats.summary(data=post_pred_S))
Edit 2:
I am looking online in order to convert the function to theano, the only thing that I found to define the function is from the following GitHub link hyp1f1 function GitHub
- Will this be enough to use in order to convert the function into theano?
- In addition, I have a question, it is okay to use NumPy arrays with theano?
Also, I thought of another way but I am not sure if this can be implemented, I looked into the nct function in scipy and they wrote the following,
If Y is a standard normal random variable and V is an independent chi-square random variable ( chi2 ) with k degrees of freedom, then
X=(Y+c) / sqrt(V/k)
has a non-central Student’s t distribution on the real line. The degrees of freedom parameter k (denoted df in the implementation) satisfies k>0 and the noncentrality parameter c (denoted nc in the implementation) is a real number.
The probability density above is defined in the “standardized” form. To shift and/or scale the distribution use the loc and scale parameters. Specifically, nct.pdf(x, df, nc, loc, scale) is identically equivalent to nct.pdf(y, df, nc) / scale with y = (x - loc) / scale .
So, I thought of only using the priors as normal and chi2 random variables code part in their distributions and use the degree of freedom variable as mentioned before in the code into the equation mentioned in SciPy, will it be enough to get the distribution?
Edit 3:
I managed to run the code in the link about fitting empirical distribution and found out the second best was the student t distribution, so, I will be using this. Thank you for your help. I just have a side question, I ran my model with student t distribution but I got these warnings:
There were 52 divergences after tuning. Increase target_accept or reparameterize. The acceptance probability does not match the target. It is 0.7037574708196309, but should be close to 0.8. Try to increase the number of tuning steps. The number of effective samples is smaller than 10% for some parameters.
I am just confused about these warnings, Do you have any idea what it means? I know that this won't affect my code, but, I can reduce the divergences? and regarding the effective samples, Do I need to increase the number of samples in the trace code?