3

I'm trying to fit an MvNormal distribution to a matrix of spectral data, but I am getting the following error:

Distributions.fit(MvNormal, myMatrix)

> ERROR: PosDefException: matrix is not positive definite; Cholesky factorization failed.

myMatrix consists of readings of absorbance (one column per item) at many consecutive wavenumbers (rows). It is not a square matrix and hence it cannot be positive definite.

From what I have seen online, lots of people report error messages of various methods failing at the Cholesky factorisation step in Julia. I understand that this comes from it being more strict than other languages at checking that the PD criterion is met when needed.

Other people (see this post, for example) have managed to solve this issue by making slight changes to the sigma parameter when stating a distribution. However, since I am not creating a distribution in terms of its parameters, but fitting it to a matrix, I am uncertain on what to do.

I would deeply appreciate any suggestions on

  • how I could try and make my matrix appropriate for Cholesky factorisation, or
  • fit a multivariate normal distribution to my data using any other method or package.
Ivan Casas
  • 133
  • 6

1 Answers1

0

At the time of writing this, I believe this error message is highly misleading, or possibly a bug.

fit isn't expecting myMatrix to be positive definite, in fact it shouldn't even be a square matrix.

fit expects the second argument to be a n by S matrix where n are the number of dimensions, and S is the number of samples you are fitting against. The error message regarding positive definitve matrices and positive definite problems just comes from the math it performs during fitting a underdetermined system. Under normal circumstances, we would expect S to be much greater than n.

Example:

# Construct samples:
C = [0.2 0; 0.1 0.3]
mean = [2.,3.]
d = MvNormal(mean, C)
samples = rand(d, 100) # This is your input data, in this case a 2x100 matrix.

# Fitting:
d_fit = Distributions.fit(MvNormal, samples)

Comparing d and d_fit, I see a pretty good match, and it gets better with the number of samples.

In summary: You probably just need more (unique) samples to fit against.

Mikael Öhman
  • 2,294
  • 15
  • 21