0

I use matlab's princomp function to do PCA. From my understanding, I could check the latent to decide how many dimensions I need.

[coeff, score, latent, t2] = princomp(fdata);
 cumsum(latent)./sum(latent);

And by using trainMatrix = coeff(:,1:10) (I choose the top 10 dimensions), and newData = data*trainMatrix, I could get the reduced data.

But how could I figure out which dimension is reduced and which 10 dimensions are remained?

I mean if I have 30 features, could I figure out after princomp, which 10 features (the column index of original data) I reserved?

Thanks.

Amro
  • 123,847
  • 25
  • 243
  • 454
Freya Ren
  • 2,086
  • 6
  • 29
  • 39

1 Answers1

1

The new dimensions correspond to a linear combination of the original dimensions, i.e each new feature is expressed in terms of all the old ones with varying weights.

Amro
  • 123,847
  • 25
  • 243
  • 454
  • I mean it there any way I could know the eigenvalue sort results? For example, I could figure out which feature is the primary one. Could I use the coeff matrix to figure out which feature has the largeset weight? – Freya Ren Apr 25 '13 at 01:04
  • @FreyaRen: PCA simply expresses the same data in a new coordinates system, such that the first dimension contains the largest variance in the data, the next dimension is perpendicular and oriented along the largest remaining variance, and so on... Perhaps a visualization such as this one might help you make sense of it: http://www.mathworks.com/help/stats/biplot.html . Of course you could truncate the dimensions to the first `k` ones, and choose `k` to get a good enough approximation that contains say 95% of the original data variance – Amro Apr 25 '13 at 13:15