6

I would like to be able to construct the scores of a principal component analysis using its loadings, but I cannot figure out what the princomp function is actually doing when it computes the scores of a dataset. A toy example:

cc <- matrix(1:24,ncol=4)
PCAcc <- princomp(cc,scores=T,cor=T)
PCAcc$loadings

Loadings:
     Comp.1 Comp.2 Comp.3 Comp.4
[1,]  0.500  0.866              
[2,]  0.500 -0.289  0.816       
[3,]  0.500 -0.289 -0.408 -0.707
[4,]  0.500 -0.289 -0.408  0.707

PCAcc$scores

       Comp.1        Comp.2        Comp.3 Comp.4
[1,] -2.92770 -6.661338e-16 -3.330669e-16      0
[2,] -1.75662 -4.440892e-16 -2.220446e-16      0
[3,] -0.58554 -1.110223e-16 -6.938894e-17      0
[4,]  0.58554  1.110223e-16  6.938894e-17      0
[5,]  1.75662  4.440892e-16  2.220446e-16      0
[6,]  2.92770  6.661338e-16  3.330669e-16      0

My understanding is that the scores are a linear combination of the loadings and the original data rescaled. Trying by "hand":

rescaled <- t(t(cc)-apply(cc,2,mean))
rescaled%*%PCAcc$loadings

     Comp.1        Comp.2        Comp.3 Comp.4
[1,]     -5 -1.332268e-15 -4.440892e-16      0
[2,]     -3 -6.661338e-16 -3.330669e-16      0
[3,]     -1 -2.220446e-16 -1.110223e-16      0
[4,]      1  2.220446e-16  1.110223e-16      0
[5,]      3  6.661338e-16  3.330669e-16      0
[6,]      5  1.332268e-15  4.440892e-16      0

The columns are off by a factor of 1.707825, 2, and 1.333333, respectively. Why is this? Since the toy data matrix has the same variance in each column, normalization shouldn't be necessary here. Any help is greatly appreciated.

Thanks!

Escotch
  • 131
  • 1
  • 7
  • 1
    On a separate note, maybe this is not the best chosen example data for a PCA since your centered (`scale(cc)`) points are all on the same line. So PC1 will capture all the variance and the other PCs will be useless (probably garbage computed from noise.) It also manifests itself by your scores being non-zero for PC1 only. – flodel Jun 01 '13 at 08:05

1 Answers1

4

You need

scale(cc,PCAcc$center,PCAcc$scale)%*%PCAcc$loadings

or easier

predict(PCAcc,newdata=cc)
Ian Fellows
  • 17,228
  • 10
  • 49
  • 63
  • Thanks, I didn't know about the scale function. I was hoping to get a better understanding of why `princomp` is scaling by a factor of 1.707825 in the first place. Where is that coming from? It would make sense to me if that were the standard deviation of a column, but it's not. – Escotch Jun 01 '13 at 08:49