I have a certain set of "2-dimensional" data which I have to study using a PCA decomposition. As a first step I tried using the matplotlib.mlab library:
import numpy as np
from matplotlib.mlab import PCA
data = np.loadtxt("Data.txt")
result = PCA(data)
#....
I then compared the scatter plot of "Data.txt" with the principal components found by mlab (stored in result.Wt). Result is the following: mlab attempt
As you can see result is not optimal. I therefore tried to do the same thing using the sklearn.decomposition libraries:
import numpy as np
from sklearn.decomposition import PCA
data = np.loadtxt("Data.txt")
pca = PCA(n_components=2,whiten=True)
pca.fit(data)
Results this time are much much better: sklearn attempt
I didn't really expect this much difference of results between these two libraries. My question is then: what are the possible reasons for such a big difference in my results?