1

i'm learning using PCA to finish dimensionality reduction (Python3.6) but i've got very similar but different results when using different methods here's my code

from numpy import *
from sklearn.decomposition import PCA

data_set = [[-1., -2.],
            [-1., 0.],
            [0., 0.],
            [2., 1.],
            [0., 1.]]
# 1
pca_sk = PCA(n_components=1)
newmat = pca_sk.fit_transform(data_set)
print(newmat)

# 2
meanVals = mean(data_set, axis=0)
meanRemoved = data_set - meanVals  
covMat = cov(meanRemoved, rowvar=0)
eigVals, eigVects = linalg.eig(mat(covMat))
eigValInd = argsort(eigVals)
eigValInd = eigValInd[:-(1 + 1):-1]
redEigVects = eigVects[:, eigValInd]
lowDDataMat = meanRemoved * redEigVects
print(lowDDataMat)

the first one output

[[ 2.12132034]
 [ 0.70710678]
 [-0.        ]
 [-2.12132034]
 [-0.70710678]]

but anothor output

[[-2.12132034]
 [-0.70710678]
 [ 0.        ]
 [ 2.12132034]
 [ 0.70710678]]

why dose it happen

life_0147
  • 11
  • 1
  • 1
    The negative version of an eigenvector, is still an eigenvector of the corresponding matrix. – not_speshal Dec 14 '21 at 19:39
  • https://stackoverflow.com/questions/44765682/in-sklearn-decomposition-pca-why-are-components-negative – Eric Marchand Dec 14 '21 at 19:39
  • With PCA you project your data into a subspace. That is the "dimension reduction". In your case you are projecting into an R^1 subspace (a line) which is contained in R^5. These two matrices (each with a single column) are different basis, but to the same subspace. There is, they are essentially the "same solution", when you think about which subspace your data will be projected to. Of course the representation of your data into that subspace will be different depending on which basis you choose to use. – darcamo Dec 14 '21 at 19:52

0 Answers0