1

I have high (100) dimensional data. I want to get the eigenvectors of the covariance matrix of the data.

Cov = numpy.cov(data)
EVs = numpy.linalg.eigvals(Cov) 

I get a vector containing some eigenvalues which are complex numbers. This is mathematically impossible. Granted, the imaginary parts of the complex numbers are very small but it still causes issues later on. Is this a numerical issue? If so, does the issue lie with cov, eigvals function or both?

To give more color on that, I did the same calculation in Mathematica which gives, of course, a correct result. Turns out there are some eigenvalues which are very close to zero but not quiet zero and numpy gets all of these wrong (magnitude wise and it makes some of them into complex numbers)

salvador
  • 99
  • 1
  • 2
  • 10
  • 2
    use `numpy.linalg.eigh`for hermitian matrices – Stelios May 21 '16 at 18:16
  • @Stelios. This returns negative eigenvalues, so it's also wrong. I already have the correct results, my question is more on where numpy is wrong. the two lines of code in my original question should never result in anything which has imaginary parts different from exactly zero. eigh should never return negative eigenvalues no matter how small if a covariance matrix is inputted. So where is the issue the cov, eigh, eigvals or all of them? – salvador May 21 '16 at 18:41
  • problem must lie with `eigh` and `eigvals` function. Doing a singular value decomposition `u,s,v = numpy.linalg.svd(cov)` gives correct results. Still wondering why 'eigh' and 'eigvals' has numerical issues. – salvador May 21 '16 at 18:44
  • This has to be a numerical issue. I suspect that the negative eigenvalues are very small (of order 10**-15) which you can set them manually to zero. It would help if you could provide a reproducible example code. – Stelios May 21 '16 at 18:46

1 Answers1

0

I was facing a similar issue: np.linalg.eigvals was returning a complex vector in which the imaginary part was quasi-zero everywhere.

Using np.linalg.eigvalsh instead fixed it for me.

I don't know the exact reason, but most probably it is a numerical issue and eigvalsh seems to handle it whereas eigvals doesn't. Note that the ordering of the actual eigenvalues may differ.


The following snippet illustrates the fix:

import numpy as np
from numpy.linalg import eigvalsh, eigvals

D = 10
MUL = 100
EPS = 1e-8

x = np.random.rand(1, D) * MUL
x -= x.mean()
S = np.matmul(x.T, x) + I
# adding epsilon*I avoids negative eigenvalues due to numerical error
# since the matrix is actually positive semidef. (useful for cholesky etc)
S += np.eye(D, dtype=np.float64) * EPS 

print(sorted(eigvalsh(S)))
print(sorted(eigvals(S)))
fr_andres
  • 5,927
  • 2
  • 31
  • 44