PCA in Orange3 and scikit-learn - what is the difference?

Question

I'm a bit puzzled: I calculated PCAs of the same dataset. Here's the workflow:

Orange 3.26: Read a .csv, PCA on 4 PCs (normalized variables), scatterplot
scikit-learn: Read the same .csv, standardizing of numerical values (StandardScaler(with_mean=True,with_std=True)), PCA (copy=True, iterated_power='auto', n_components=4, random_state=None, svd_solver='auto', tol=0.0, whiten=False)

The results differ in the numerical values for the single PCs:

Orange 3.26:

scikit-learn:

Here is my code for scikit-learn-fu:

I have a pd.DataFrame, shape is (268,16). In a first step I slice the dataframe in two daframes:

In a next step I standardize dataframe A1 with StandardScaler from sklearn.preprocessing:

a1 = StandardScaler(with_mean=True,with_std=True).fit_transform(A1)

The next step is the PCA:

pca1 = PCA(n_components=4)
principalComponents1 = pca1.fit_transform(a1)

The outputs are the scores and loadings - nothing special.

Perhaps a difference in normalization of the initial dataset? Any suggestions?

this can be due to a number of factors such as: the normalization, the PCA solver (eig, svd), the input argument `random_state=None` for the PCA in `sklearn` .... — seralouk, Sep 08 '20 at 07:24
Thanks @seralouk. However, my impression is that the normalization should be the reason. Unfortunately I can't find the "default" settings for normalization in Orange3. — Markus, Sep 08 '20 at 07:36
do it manually, standrization is easy to be done. here is the equations: https://stackoverflow.com/a/50879522/5025009 — seralouk, Sep 08 '20 at 08:52
@seralouk: I did exactly the same in scikit-learn. I just want to understand the difference in Orange3 when you take the initial data, go to the PCA-widget, choose 4 PCs and enable the normalization-check box. When I take the standardized data (with µ=0 and s²=1) in Orange3 and perform a PCA without normalization, the PCs are again different (somewhere in between the screenshots in my question). — Markus, Sep 08 '20 at 09:04

0 Answers0