21

I'm trying to read through PCA and saw that the objective was to maximize the variance. I don't quite understand why. Any explanation of other related topics would be helpful

nbro
  • 15,395
  • 32
  • 113
  • 196
karthik A
  • 655
  • 1
  • 11
  • 19

4 Answers4

19

Variance is a measure of the "variability" of the data you have. Potentially the number of components is infinite (actually, after numerization it is at most equal to the rank of the matrix, as @jazibjamil pointed out), so you want to "squeeze" the most information in each component of the finite set you build.

If, to exaggerate, you were to select a single principal component, you would want it to account for the most variability possible: hence the search for maximum variance, so that the one component collects the most "uniqueness" from the data set.

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • 1
    It seems like a good answer, just one correction the number of principal components of a matrix are at most equal to rank of that matrix and not "potentially infinite". – jazib jamil Feb 19 '20 at 10:43
  • In order to find a PCA of data set we need to first plot it on graph if data set has 2 features we can plot it as 2D graph and then compute PCA but how would we plot 4D graph for 4 features of data in order to compute its PCA ? – Ahtisham Sep 06 '20 at 07:27
10

Note that PCA does not actually increase the variance of your data. Rather, it rotates the data set in such a way as to align the directions in which it is spread out the most with the principal axes. This enables you to remove those dimensions along which the data is almost flat. This decreases the dimensionality of the data while keeping the variance (or spread) among the points as close to the original as possible.

Don Reba
  • 13,814
  • 3
  • 48
  • 61
  • could you give a reference, which explains PCA in terms of this rotation point of view? – Atilla Ozgur Sep 14 '12 at 09:20
  • @AtillaOzgur PCA produces an orthonormal transformation matrix. Orthonormal matrices are combinations of rotations and reflections. – Don Reba Sep 14 '12 at 09:44
6

Maximizing the component vector variances is the same as maximizing the 'uniqueness' of those vectors. Thus you're vectors are as distant from each other as possible. That way if you only use the first N component vectors you're going to capture more space with highly varying vectors than with like vectors. Think about what Principal Component actually means.

Take for example a situation where you have 2 lines that are orthogonal in a 3D space. You can capture the environment much more completely with those orthogonal lines than 2 lines that are parallel (or nearly parallel). When applied to very high dimensional states using very few vectors, this becomes a much more important relationship among the vectors to maintain. In a linear algebra sense you want independent rows to be produced by PCA, otherwise some of those rows will be redundant.

See this PDF from Princeton's CS Department for a basic explanation.

Pyrce
  • 8,296
  • 3
  • 31
  • 46
2

max variance max variance is basically setting these axis that occupy the maximum spread of the datapoints, why? because the direction of this axis is what really matters as it kinda explains correlations and later on we will compress/project the points along those axis to get rid of some dimensions

Mr-Programs
  • 767
  • 4
  • 20