Density of multivariate t distribution in Python for large number of observations

Question

I am trying to evaluate the density of multivariate t distribution of a 13-d vector. Using the dmvt function from the mvtnorm package in R, the result I get is

[1] 1.009831e-13

When i tried to write the function by myself in Python (thanks to the suggestions in this post: multivariate student t-distribution with python), I realized that the gamma function was taking very high values (given the fact that I have n=7512 observations), making my function going out of range.

I tried to modify the algorithm, using the math.lgamma() and np.linalg.slogdet() functions to transform it to the log scale, but the result I got was

 8.97669876e-15

This is the function that I used in python is the following:

def dmvt(x,mu,Sigma,df,d):
    '''
    Multivariate t-student density:
    output:
        the density of the given element
    input:
        x = parameter (d dimensional numpy array or scalar)
        mu = mean (d dimensional numpy array or scalar)
        Sigma = scale matrix (dxd numpy array)
        df = degrees of freedom
        d: dimension
    '''
    Num = math.lgamma( 1. *(d+df)/2 ) - math.lgamma( 1.*df/2 )
    (sign, logdet) = np.linalg.slogdet(Sigma)
    Denom =1/2*logdet + d/2*( np.log(pi)+np.log(df) ) + 1.*( (d+df)/2 )*np.log(1 + (1./df)*np.dot(np.dot((x - mu),np.linalg.inv(Sigma)), (x - mu))) 
    d = 1. * (Num - Denom) 
    return np.exp(d)

Any ideas why this functions does not produce the same results as the R equivalent?

Using as x = (0,0) produces similar results (up to a point, die to rounding) but with x = (1,1)1 I get a significant difference!

Try running your function on some less extreme inputs, eg what is dvmt(0) in 2 dimensions. This will tell you whether you're just running into roundoff error (8e-15 is zero given the limits of double-precision FP arithmetic) or if your code has a bug. — Hong Ooi, Dec 07 '16 at 10:32
@HongOoi thank you very much for your comment! Using the following input: `x= (0,0) mu = (0,0) sigma = diag(2)` I got `0.1591549` in R and `0.159154943092` in Python, which seem to be the same until a point But when I use x = (1,1) the R result is `0.03062938` and `0.0530516476973` in Python — Leonidas Souliotis, Dec 07 '16 at 14:40

score 0 · Answer 1 · answered Feb 20 '17 at 09:15

I finally managed to 'translate' the code from the mvtnorm package in R and the following script works without numerical underflows.

import numpy as np
import scipy.stats
import math
from math import lgamma
from numpy import matrix
from numpy import linalg
from numpy.linalg import slogdet
import scipy.special
from scipy.special import gammaln

mu = np.array([3,3])
x = np.array([1, 1])
Sigma = np.array([[1, 0], [0, 1]])
p=2
df=1

def dmvt(x, mu, Sigma, df, log):
    '''
    Multivariate t-student density. Returns the density
    of the function at points specified by x.

    input:
        x = parameter (n x d numpy array)
        mu = mean (d dimensional numpy array)
        Sigma = scale matrix (d x d numpy array)
        df = degrees of freedom
        log = log scale or not

    '''
    p = Sigma.shape[0] # Dimensionality
    dec = np.linalg.cholesky(Sigma)
    R_x_m = np.linalg.solve(dec,np.matrix.transpose(x)-mu)
    rss = np.power(R_x_m,2).sum(axis=0)
    logretval = lgamma(1.0*(p + df)/2) - (lgamma(1.0*df/2) + np.sum(np.log(dec.diagonal())) \
       + p/2 * np.log(math.pi * df)) - 0.5 * (df + p) * math.log1p((rss/df) )
    if log == False:    
        return(np.exp(logretval))
    else:
         return(logretval)


print(dmvt(x,mu,Sigma,df,True))
print(dmvt(x,mu,Sigma,df,False))

Density of multivariate t distribution in Python for large number of observations

1 Answers1