In sklearn.decomposition.PCA, why are components_ negative?

Question

I'm trying to follow along with Abdi & Williams - Principal Component Analysis (2010) and build principal components through SVD, using numpy.linalg.svd.

When I display the components_ attribute from a fitted PCA with sklearn, they're of the exact same magnitude as the ones that I've manually computed, but some (not all) are of opposite sign. What's causing this?

Update: my (partial) answer below contains some additional info.

Take the following example data:

from pandas_datareader.data import DataReader as dr
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

# sample data - shape (20, 3), each column standardized to N~(0,1)
rates = scale(dr(['DGS5', 'DGS10', 'DGS30'], 'fred', 
           start='2017-01-01', end='2017-02-01').pct_change().dropna())

# with sklearn PCA:
pca = PCA().fit(rates)
print(pca.components_)
[[-0.58365629 -0.58614003 -0.56194768]
 [-0.43328092 -0.36048659  0.82602486]
 [-0.68674084  0.72559581 -0.04356302]]

# compare to the manual method via SVD:
u, s, Vh = np.linalg.svd(np.asmatrix(rates), full_matrices=False)
print(Vh)
[[ 0.58365629  0.58614003  0.56194768]
 [ 0.43328092  0.36048659 -0.82602486]
 [-0.68674084  0.72559581 -0.04356302]]

# odd: some, but not all signs reversed
print(np.isclose(Vh, -1 * pca.components_))
[[ True  True  True]
 [ True  True  True]
 [False False False]]

Here's an explanation using the R packages for PCA. https://stats.stackexchange.com/questions/88880/does-the-sign-of-scores-or-of-loadings-in-pca-or-fa-have-a-meaning-may-i-revers — Alex F, Jun 26 '17 at 18:04

P. Camilleri · Accepted Answer · 2017-06-30T13:02:37.930

As you figured out in your answer, the results of a singular value decomposition (SVD) are not unique in terms of singular vectors. Indeed, if the SVD of X is \sum_1^r \s_i u_i v_i^\top :

with the s_i ordered in decreasing fashion, then you can see that you can change the sign (i.e., "flip") of say u_1 and v_1, the minus signs will cancel so the formula will still hold.

This shows that the SVD is unique up to a change in sign in pairs of left and right singular vectors.

Since the PCA is just a SVD of X (or an eigenvalue decomposition of X^\top X), there is no guarantee that it does not return different results on the same X every time it is performed. Understandably, scikit learn implementation wants to avoid this: they guarantee that the left and right singular vectors returned (stored in U and V) are always the same, by imposing (which is arbitrary) that the largest coefficient of u_i in absolute value is positive.

As you can see reading the source: first they compute U and V with linalg.svd(). Then, for each vector u_i (i.e, row of U), if its largest element in absolute value is positive, they don't do anything. Otherwise, they change u_i to - u_i and the corresponding left singular vector, v_i, to - v_i. As told earlier, this does not change the SVD formula since the minus sign cancel out. However, now it is guaranteed that the U and V returned after this processing are always the same, since the indetermination on the sign has been removed.

@BradSolomon If I may, in which case is it useful to have non deterministic results ? — P. Camilleri, Sep 07 '17 at 15:59
These are still deterministic--it is just a question of whether the sign flip is a "U-based decision" or a "V-based decision". See `svd_flip` for reference. My point is that I wanted to make v-based rather than u-based decision. See related issue [here](https://github.com/scikit-learn/scikit-learn/issues/9692). Let me know if I'm making sense — Brad Solomon, Sep 07 '17 at 16:03

Brad Solomon · Answer 2 · 2017-06-28T18:42:48.333

After some digging, I've cleared up some, but not all, of my confusion on this. This issue has been covered on stats.stackexchange here. The mathematical answer is that "PCA is a simple mathematical transformation. If you change the signs of the component(s), you do not change the variance that is contained in the first component." However, in this case (with sklearn.PCA), the source of ambiguity is much more specific: in the source (line 391) for PCA you have:

U, S, V = linalg.svd(X, full_matrices=False)
# flip eigenvectors' sign to enforce deterministic output
U, V = svd_flip(U, V)

components_ = V

svd_flip, in turn, is defined here. But why the signs are being flipped to "ensure a deterministic output," I'm not sure. (U, S, V have already been found at this point...). So while sklearn's implementation is not incorrect, I don't think it's all that intuitive. Anyone in finance who is familiar with the concept of a beta (coefficient) will know that the first principal component is most likely something similar to a broad market index. Problem is, the sklearn implementation will get you strong negative loadings to that first principal component.

My solution is a dumbed-down version that does not implement svd_flip. It's pretty barebones in that it doesn't have sklearn parameters such as svd_solver, but does have a number of methods specifically geared towards this purpose.

By convention, the singular values are all positive and ordered by size. — Arya McCarthy, Jun 27 '17 at 03:03
@AryaMcCarthy I'm not sure if I follow you, can you please explain further? The singular values are `S`. If you look at the PCA [code](https://github.com/scikit-learn/scikit-learn/blob/4c65d8e/sklearn/decomposition/pca.py#L389), the `S` vector isn't being touched. (It's already positive after `U, S, V = linalg.svd(X, full_matrices=False)`, line 391. It's `U` and `V` that are being manipulated, to "enforce deterministic output" even though a solution has already been found. — Brad Solomon, Jun 28 '17 at 02:17

score 2 · Answer 3 · answered Jun 26 '17 at 18:21

With the PCA here in 3 dimensions, you basically find iteratively: 1) The 1D projection axis with the maximum variance preserved 2) The maximum variance preserving axis perpendicular to the one in 1). The third axis is automatically the one which is perpendicular to first two.

The components_ are listed according to the explained variance. So the first one explains the most variance, and so on. Note that by the definition of the PCA operation, while you are trying to find the vector for projection in the first step, which maximizes the variance preserved, the sign of the vector does not matter: Let M be your data matrix (in your case with the shape of (20,3)). Let v1 be the vector for preserving the maximum variance, when the data is projected on. When you select -v1 instead of v1, you obtain the same variance. (You can check this out). Then when selecting the second vector, let v2 be the one which is perpendicular to v1 and preserves the maximum variance. Again, selecting -v2 instead of v2 will preserve the same amount of variance. v3 then can be selected either as -v3 or v3. Here, the only thing which matters is that v1,v2,v3 constitute an orthonormal basis, for the data M. The signs mostly depend on how the algorithm solves the eigenvector problem underlying the PCA operation. Eigenvalue decomposition or SVD solutions may differ in signs.

score 0 · Answer 4 · answered Sep 12 '17 at 18:30

This is a short notice for those who care about the purpose and not the math part at all.

Although the sign is opposite for some of the components, that shouldn't be considered as a problem. In fact what we do care about (at least to my understanding) is the axes' directions. The components, ultimately, are vectors that identify these axes after transforming the input data using pca. Therefore no matter what direction each component is pointing to, the new axes that our data lie on will be the same.

In sklearn.decomposition.PCA, why are components_ negative?

4 Answers4

Linked