0

I currently have a data matrix that has more columns than rows. I am using the principal function in psych package for my PCAs. I should be able to perform a PCA on this, but cannot seem to get it to work if there are more columns than rows. Error message is about singularity.

With the full matrix, the error message reads

Error in solve.default(r, result$Structure) : Lapack routine dgesv: system is exactly singular: U[80,80] = 0

If I reduce the # columns, but have it still more than the number of rows, the error message reads:

Error in solve.default(r, result$Structure) : system is computationally singular: reciprocal condition number = 2.00483e-19

Does anyone know if there are any settings to tweak to get this to work? I only need the first two components. There are no missing values in the matrix.

If I use JMP, I can get the PCA to work fine.

J.Con
  • 4,101
  • 4
  • 36
  • 64
  • 1
    Welcome to SO. Please read [how to ask a question](https://stackoverflow.com/help/how-to-ask) and [how to give a good reproducible example in R](https://stackoverflow.com/q/5963269/3250126) – loki Jun 14 '17 at 06:52

2 Answers2

0

First of all, do you have missing values in your data? Check for them with complete.cases. Function is.na() allows you to check for NA values. That may help you to reduce the number of variables.

About PCA, I guess that you're using principal function from the psych package. Please note in Details section of ?principal that:

Both PC and FA attempt to approximate a given correlation or covariance matrix
of rank n with matrix of lower rank (p). nRn = nFk kFn' + U2 where k is much
less than n. 

Continue reading the Details section for an overview.

Anyway, be aware than PCA will never return more components than observations.

elcortegano
  • 2,444
  • 11
  • 40
  • 58
  • There are no missing values in the data matrix. I am using the principal function from the psych package. I do not need there to be more components than observations, but I would like the PCA to run if there are more components than observations. Code line is pca<-principal(datatemp23[2:81],nfactors=3,rotate="none",scores=T) where I have 80 columns of data for 25 rows. Is it just not possible to run this with principal function? – Joseph Craine Jun 14 '17 at 12:28
  • There cannot be more components than observations. In a PCA you will get a number of components not higher than either the number of variables (columns) or the number of observations (rows) - 1. The lesser of there two criteria (edit: so with your data you should never expect to obtain more than 24 components). About running that command, **as far as I know**, not, it's not possible – elcortegano Jun 14 '17 at 17:32
  • It's interesting that it is possible to run a PCA with more columns than rows, but the psych package doesn't let one do that. JMP uses a pairwise method when there is more columns than rows. "Pairwise estimation performs correlations for all rows for each pair of columns with nonmissing values." – Joseph Craine Jun 14 '17 at 19:13
  • At this point I only can recommend you to ask for this in "cross validated" (close similar to stack overflow but oriented to statistic). Hope I've been of some help. – elcortegano Jun 14 '17 at 19:17
0

The problem is that principal is not just finding the principal components, but it is also trying to score the components for you. That requires inverting the correlation matrix, at which point principal throws the error.

If, however, you just want the component loadings, try the following. For my example I am using the first 25 items of the bfi, but only using 24 subjects.

pc <- principal (bfi[1:24,1:25], nfactors = 5,scores = FALSE)
William Revelle
  • 1,200
  • 8
  • 15
  • And in fact, we don't need to find scores by inverting the correlation matrix. That was a carryover from finding factor scores. Just multiplying the standardized data by the component loadings will do the trick. This will be fixed in the next version of psych. – William Revelle Jul 08 '17 at 22:55