I have a dataset with 400 features.
What I did:
# approach 1
d_cov = np.cov(d_train.transpose())
eigens, mypca = LA.eig(d_cov) # assume sort by eigen value also/ LA = numpy linear algebra
# approach 2
pca = PCA(n_components=300)
d_fit = pca.fit_transform(d_train)
pc = pca.components_
Now, these two should be the same, right? as PCA is just the eigendecomposition of the covariance matrix.
But these are very different in my case?
How could that be, I am doing any mistake above?
Comparing variances:
import numpy as np
LA = np.linalg
d_train = np.random.randn(100, 10)
d_cov = np.cov(d_train.transpose())
eigens, mypca = LA.eig(d_cov)
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
pca = PCA(n_components=10)
d_fit = pca.fit_transform(d_train)
pc = pca.components_
ve = pca.explained_variance_
#mypca[0,:], pc[0,:] pc.transpose()[0,:]
plt.plot(list(range(len(eigens))), [ x.transpose().dot(d_cov).dot(x) for x,y in zip(mypca, eigens) ])
plt.plot(list(range(len(ve))), ve)
plt.show()
print(mypca, '\n---\n' , pc)