3

Now i have a 7000*7000 correlation matrix and I have to do PCA on this in R. I used the

CorPCA <- princomp(covmat=xCor)

, xCor is the correlation matrix but it comes out

"covariance matrix is not non-negative definite"

it is because i have some negative correlation in that matrix. I am wondering which inbuilt function in R that i can use to get the result of PCA

PKumar
  • 10,971
  • 6
  • 37
  • 52
Louisyan
  • 87
  • 1
  • 7
  • 1
    Doing PCA of a correlation matrix seems like a strange idea. I don't think you should do this. – Roland Feb 17 '14 at 15:48
  • On [stats.stackexchange.com](http://stats.stackexchange.com/questions/53/pca-on-correlation-or-covariance) is some discussion about this issue. – Georg Schnabel Feb 17 '14 at 18:32
  • You can use prcomp(), that uses SVD, instead of princomp(), that uses eigenvalues of the covariance matrix – Carlos AG Aug 19 '18 at 14:52

2 Answers2

3

One method to do the PCA is to perform an eigenvalue decomposition of the covariance matrix, see wikipedia.

The advantage of the eigenvalue decomposition is that you see which directions (eigenvectors) are significant, i.e. have a noticeable variation expressed by the associated eigenvalues. Moreover, you can detect if the covariance matrix is positive definite (all eigenvalues greater than zero), not negative-definite (which is okay) if there are eigenvalues equal zero or if it is indefinite (which is not okay) by negative eigenvalues. Sometimes it also happens that due to numerical inaccuracies a non-negative-definite matrix becomes negative-definite. In that case you would observe negative eigenvalues which are almost zero. In that case you can set these eigenvalues to zero to retain the non-negative definiteness of the covariance matrix. Furthermore, you can still interpret the result: eigenvectors contributing the significant information are associated with the biggest eigenvalues. If the list of sorted eigenvalues declines quickly there are a lot of directions which do not contribute significantly and therefore can be dropped.

The built-in R function is eigen

If your covariance matrix is A then

eigen_res <- eigen(A)
# sorted list of eigenvalues
eigen_res$values
# slightly negative eigenvalues, set them to small positive value
eigen_res$values[eigen_res$values<0] <- 1e-10
# and produce regularized covariance matrix
Areg <- eigen_res$vectors %*% diag(eigen_res$values) %*% t(eigen_res$vectors)
Georg Schnabel
  • 631
  • 4
  • 8
  • thank you for your answering! I think its a way that definitely need a try. First, i should check the eigenvalue to see what the values look like :) – Louisyan Feb 17 '14 at 15:54
  • well, i found i have many complex number in the eigen value..so I need to change them to zero also? – Louisyan Feb 17 '14 at 16:00
  • If your covariance matrix is symmetric (and it should be symmetric!) from the mathematical viewpoint you should get only real eigenvalues. You can check the symmetry by e.g. max(A-t(A)). If it should have a chance to be a valid covariance matrix however, the imaginary parts of the complex numbers should be small (I guess, not 100% sure). – Georg Schnabel Feb 17 '14 at 16:06
  • yes, i checked that the matrix is symmetric. and the complex number is very small..you are right :) maybe because of the large scale of the matrix.. – Louisyan Feb 17 '14 at 16:13
  • A small additional note: if you know that the matrix is symmetric, you can set `symmetric=TRUE` in the call to `eigen` to speed up the calculation and remove the complex numbers. – Georg Schnabel Feb 17 '14 at 16:16
  • Use the fuction nearPD() of the Matrix package!! – Carlos AG Aug 19 '18 at 14:53
1

not non-negative definite does not mean the covariance matrix has negative correlations. It's a linear algebra equivalent of trying to take square root of negative number! You can't tell by looking at a few values of the matrix, whether it's positive definite.

Try adjusting some default values like tolerance in princomp call. Check this thread for example: How to use princomp () function in R when covariance matrix has zero's?

An alternative is to write some code of your own to perform what is called a n NIPLAS analysis. Take a look at this thread on the R-mailing list: https://stat.ethz.ch/pipermail/r-help/2006-July/110035.html

I'd even go as far as asking where did you obtain the correlation matrix? Did you construct it yourself? Does it have NAs? If you constructed xCor from your own data, do you think you can sample the data and construct a smaller xCor matrix? (say 1000X1000). All these alternatives try to drive your PCA algorithm through the 'happy path' (i.e. all matrix operations can be internally carried out without difficulties in diagonalization etc..i.e., no more 'non-negative definite error msgs)

Community
  • 1
  • 1
Sudeep Juvekar
  • 4,898
  • 3
  • 29
  • 35
  • 1
    thanks for answering, i think i get the mistake i have made. well, the correlation matrix is the result of some internal model, i can not get the raw data...and 7000*7000 is quite hard to analysis.i am now trying hard to get something from this matrix :( well..nothing so far.. i think it should be some dimension reduction techniques,PCA is just one approach, i also try factor analysis in R. However, the result is still not that good. – Louisyan Feb 17 '14 at 15:15