1

PCA is a dimensionality reduction algorithm that helps in reducing the dimensions of our data. The thing I haven't understood is that PCA gives an output of eigen vectors in decreasing order such as PC1,PC2,PC3 and so on. So this will become new axes for our data.

  • Where could we apply this new axes to predict the test set data?

  • We achieved dimensionality reduction from n to some n-k.

  • How to get the most useful variables from our data and eliminate the unimportant columns from our data?
  • Is there an alternative approach for PCA?
FunnyCoder
  • 403
  • 1
  • 4
  • 8

1 Answers1

1

The idea of PCA is to reduce the dimensions to a subspace created of the n-k eigen vectors with the largest variance, resulting in the largest variance in the data mapped to your new subspace.

Furthermore it is possible to use PCA to reduce your dimensionality without knowing the classes of your training data, meaning it is unsupervised.

Another option, if you know the classes of your training data, is to use LDA which tries to find the feature space that maximize the between class variation.

Hope this helps

Mathias
  • 173
  • 1
  • 1
  • 8
  • That's not the question @Mathias. My actual doubt is that suppose take a data of 10 features and we reduced it to 3 dimensions. So the new dimensions are completely different from our actual data. Can we know which feature is most important from PCA? – FunnyCoder Nov 01 '17 at 09:29
  • 1
    PCA does not consider which features will be good for classifying as it is an unsupervised method, the dimensional reduction is only based on the largest variance in the feature space. Therefore you will just know which features have the greatest variance. Hench you will not know which features are good for classification, but you will know which features have the most variance. – Mathias Nov 01 '17 at 10:40
  • We can find the feature with most variance from basic formulas using Standard Deviation. I was wondering what's the exact purpose of PCA? Is it to visualize the 100 dimensional plot in a 2 dimensional plane? – FunnyCoder Nov 01 '17 at 11:06
  • 1
    PCA does not find the features with largest variance within the original data space, it finds a combination of all the features that creates the largest variance within the data. Yes wiving data could be one purpose, or simply to reduce the amount of data you have to process. – Mathias Nov 01 '17 at 11:44