Difference in PCA implementation between numpy only vs sklearn

Question

from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets('data/MNIST/', one_hot=True)

numpy implementation

# Entire Data set
Data=np.array(mnist.train.images)
#centering the data
mu_D=np.mean(Data, axis=0)
Data-=mu_D


COV_MA = np.cov(Data, rowvar=False)
eigenvalues, eigenvec=scipy.linalg.eigh(COV_MA, eigvals_only=False)
together = zip(eigenvalues, eigenvec)
together = sorted(together, key=lambda t: t[0], reverse=True)
eigenvalues[:], eigenvec[:] = zip(*together)


n=3
pca_components=eigenvec[:,:n]
print(pca_components.shape)
data_reduced = Data.dot(pca_components)
print(data_reduced.shape)
data_original = np.dot(data_reduced, pca_components.T) # inverse_transform
print(data_original.shape)


plt.imshow(data_original[10].reshape(28,28),cmap='Greys',interpolation='nearest')

sklearn implementation

from sklearn.decomposition import PCA

pca = PCA(n_components=3)
pca.fit(Data)

data_reduced = np.dot(Data, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform
plt.imshow(data_original[10].reshape(28,28),cmap='Greys',interpolation='nearest')

I'd like to implement PCA algorithms by using numpy. However I don't know how to reconstruct the images from that and I don't even know if this code is correct.

Actually, when I used sklearn.decomposition.PCA, the result is different from the numpy implementation.

Can you explain the differences?

Ah sorry, i used only 2 in Mnist, but you can think it as Data. It doesn’t matter. — N EMO, Sep 30 '18 at 05:26
@ N EMO: The reason I want to know about `only_2` is to understand if you scaled it, and if so, how you did it? — ababuji, Sep 30 '18 at 05:28
Another recent PCA question: https://stackoverflow.com/questions/52565675/bug-in-scikit-learn-pca-or-in-numpy-eigen-decomposition — hpaulj, Sep 30 '18 at 05:50
I edited my question for clarity, maybe you guys can understand it better than before. — N EMO, Sep 30 '18 at 08:15
I think the biggest difference is that `sklearn` uses an SVD approach by default rather than computing the full covariance structure. So it's unlikely that you'd see exactly the same results, though very similar. — piman314, Oct 01 '18 at 13:09

score 1 · Answer 1 · answered Sep 30 '18 at 05:25

I can spot a few differences already.

For one:

n=300
projections = only_2.dot(eigenvec[:,:n])
Xhat = np.dot(projections, eigenvec[:,:n].T)
Xhat += mu_D
plt.imshow(Xhat[5].reshape(28,28),cmap='Greys',interpolation='nearest')

The point I'm trying to make is, if my understanding is correct n = 300, you are trying to fit 300 eigen vectors whose eigen values go from high to low.

But in sklearn

from sklearn.decomposition import PCA

pca = PCA(n_components=1)
pca.fit(only_2)

data_reduced = np.dot(only_2, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # invers

It seems to me you are fitting just the FIRST component (the component that maximizes variance) and you're not taking all 300.

Further more:

One thing I can clearly, say is that you seem to understand what's happening in PCA but you're having trouble implementing it. Correct me if I'm wrong but:

data_reduced = np.dot(only_2, pca.components_.T) # transform
data_original = np.dot(data_reduced, pca.components_) # inverse_transform

In this part, you are trying to PROJECT your eigenvectors to your data which is what you should go about doing in PCA, but in sklearn, what you should do is the following:

 import numpy as np
 from sklearn.decomposition import PCA

 pca = PCA(n_components=300)
 pca.fit_transform(only_2)

If you could tell me how you created only_2, I can give you a much more specific answer tomorrow.

Here is what sklearn says about fit_transform for PCA: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.fit_transform:

fit_transform(X, y=None)
Fit the model with X and apply the dimensionality reduction on X.

Parameters: 
X : array-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.

y : Ignored
Returns:    
X_new : array-like, shape (n_samples, n_components)

Sorry for my mistake, You don't have to focus on only_2 and number of n_components, it was just a mistake. Because i already wrote the algorithms with same n_components and same 'Data' set actually, i am just wondering whether my numpy code is incorrect or not. Thank you for knowing me about pca.fit_transform. By the way, what is the difference between data_reduced and pca.fit_transform? and how to implement reconstruction algorithms in numpy? — N EMO, Sep 30 '18 at 07:30
I thought pca.components_ are eigenvectors in sklearn. Are these different? — N EMO, Sep 30 '18 at 07:38
@ N EMO: You are correct. And your calculation can only be understood by people who know PCA. Your reconstruction into `data_reduced` is RIGHT. But why not use the inbuilt method? — ababuji, Sep 30 '18 at 07:43
Hahaha just a practice. I edited my question more clearly. I hope you would get precise information about my implementation than before. — N EMO, Sep 30 '18 at 08:14
@NEMO Please give me a few hours. I will get this done and show you. I have an assignment to submit — ababuji, Oct 01 '18 at 04:03

Difference in PCA implementation between numpy only vs sklearn

1 Answers1