I am trying to estimate the probability of some samples assuming that they follow a multivariate gaussian distribution. I start by using my data set to estimate the covariance matrix and the mean vector with numpy and then I use scipy to obtain the probability values of some samples.
The problem I am facing is that the probability values that I get are much higher than 1. Here is the code:
import string
import numpy as np
from scipy.stats import multivariate_normal
# sample_list contains my samples
x = np.array(sample_list).T
cov = np.cov(x)
mean_vec = np.mean(x, axis=1)
print "Covariance"
print cov
print "Mean"
print mean_vec
print "Determinant"
print np.linalg.det(cov)
rv = multivariate_normal(mean=mean_vec, cov=cov)
x = np.random.multivariate_normal(mean_vec, cov)
print "Random sample"
print x
print "pdf(x)"
print rv.pdf(x)
And here is one result example:
Covariance
[[ 0.00114548 0.00126964 0.00163885 -0.00097021]
[ 0.00126964 0.00147635 0.00181914 -0.00051609]
[ 0.00163885 0.00181914 0.0023556 -0.00165483]
[-0.00097021 -0.00051609 -0.00165483 0.01451006]]
Mean
[ 0.04382192 0.05937116 0.05526359 0.36803589]
Determinant
1.23897068383e-15
Random sample
[ 0.04897992 0.0540464 0.06269819 0.30528633]
pdf(x)
150525.528322
As you can see, the value of the pdf is way high. Am I doing something wrong in the code? Or maybe there is something about the math that I don't understand.
Thank you!