1

I'm trying to run PCA in R for dimension reduction. As a result of this procedure I choose 25 out of 2000 features. but I cannot figure out how to map these selected features to the ones of the original data. Any help please?

here is a part of my code :

rawdata <- read.csv("alon.csv", header = FALSE)
pmatrix <- scale(rawdata)
princ <- prcomp(pmatrix)
nComp <- 25 
dfComponents <- predict(princ, newdata = pmatrix)[, 1:nComp]

and here is 3 rows and 6 columns of my 62*2000 data:

0.508777205 0.229010718 0.092779946 0.038210585 -0.692175368    0.240419603
0.694627686 0.800665661 0.433820868 -0.133540337 -0.679403925 0.36020227
-1.031396049    0.91525797  0.701421715 0.355537228 -1.30483618 -1.304934251
h.m.gh
  • 87
  • 1
  • 10
  • 2
    Can you provide some sample data and code? This post will guide you on [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – markus Jul 30 '18 at 06:39
  • I edited the post. @markus – h.m.gh Jul 30 '18 at 07:04
  • 1
    Can you please actually read the link that markus linked on how to make a reproducible example. While you have edited the post, you have not followed the guidelines on how to make a reproducible example – Conor Neilson Jul 30 '18 at 07:09
  • Can you post sample data? Please edit **the question** with the output of `dput(rawdata)`. Or, if it is too big with the output of `dput(head(rawdata, 20))`. Where is the dataset `labels` used? – Rui Barradas Jul 30 '18 at 07:14
  • You're right labels are not used in this part. The output of dput(head(rawdata,20)) is still too big to be copied. What should I do now? @RuiBarradas – h.m.gh Jul 30 '18 at 07:29
  • 2
    It's hard to understand what you want, but if I get it, then it's not possible. PCA constructs new features from the old ones, I doubt that they can be precisely matched. – pogibas Jul 30 '18 at 07:33
  • I mean could I figure out after princomp, which 25 features (the column index of original data) I reserved?@PoGibas – h.m.gh Jul 30 '18 at 07:53
  • Those are totally new features - read theory on PCA – pogibas Jul 30 '18 at 07:58

1 Answers1

0

princ$rotation will return the loadings and then you can slice it:

princ$rotation[ , 1:25]

# also check out princ$x and explore the nested object returned by prcomp

Here is a comprehensive article for reference:
https://www.analyticsvidhya.com/blog/2016/03/practical-guide-principal-component-analysis-python/

OzanStats
  • 2,756
  • 1
  • 13
  • 26