4

I have this script which does a simple PCA analysis on number of variables and at the end attaches two coordinates and two other columns(presence, NZ_Field) to the output file. I have done this many times before but now its giving me this error:

I understand that it means there are negative eigenvalues. I looked at similar posts which suggest to use na.omit but it didn't work. I have uploaded the "biodata.Rdata" file here:

covariance matrix is not non-negative definite

https://www.dropbox.com/s/1ex2z72lilxe16l/biodata.rdata?dl=0

I am pretty sure it is not because of missing values in data because I have used the same data with different "presence" and "NZ_Field" column.

Any help is highly appreciated.

load("biodata.rdata")

#save data separately
coords=biodata[,1:2]
biovars=biodata[,3:21]
presence=biodata[,22]
NZ_Field=biodata[,23]

#Do PCA
bpc=princomp(biovars ,cor=TRUE)

#re-attach data with auxiliary data..coordinates, presence and NZ location data
PCresults=cbind(coords, bpc$scores[,1:3], presence, NZ_Field)
write.table(PCresults,file= "hlb_pca_all.txt", sep= ",",row.names=FALSE)
Hank
  • 51
  • 1
  • 4
  • When do you get this error? –  Oct 16 '14 at 05:03
  • hi, when I run this part : bpc=princomp(biovars ,cor=TRUE) – Hank Oct 16 '14 at 05:13
  • It works for me (R version 3.1.1 Patched, Platform: x86_64-unknown-linux-gnu (64-bit)). –  Oct 16 '14 at 05:14
  • hmm, I have R x64 3.1.1 windows – Hank Oct 16 '14 at 05:18
  • 1
    You can also use the `prcomp` function instead of `princomp` and it should work in your case. – nicola Oct 16 '14 at 05:20
  • Thanks nicola, buy why prcomp works and princom doesn't?!! – Hank Oct 16 '14 at 05:35
  • 1
    From `?prcomp`: `"The calculation is done by a singular value decomposition of the (centered and possibly scaled) data matrix, not by using eigen on the covariance matrix. This is generally the preferred method for numerical accuracy."` –  Oct 16 '14 at 05:37
  • @user1436187 It is what `prcomp` does. –  Oct 16 '14 at 06:11
  • It generates eigen values and vectors. http://math.stackexchange.com/questions/3869/what-is-the-intuitive-relationship-between-svd-and-pca – user1436187 Oct 16 '14 at 06:47

1 Answers1

0

This does appear to be an issue with missing data so there are a few ways to deal with it. One way is to manually do listwise deletion on the data before running the PCA which in your case would be:

biovars<-biovars[complete.cases(biovars),]

The other option is to use another package, specifically psych seems to work well here and you can use principal(biovars), and while the output is bit different it does work using pairwise deletion, so basically it comes down to whether or not you want to use pairwise or listwise deletion. Thanks!

costebk08
  • 1,299
  • 4
  • 17
  • 42