0

I'm using PCA in the sklearn python package, and I was wondering how to simply return how much variation each dimension in the data explains, in order. For example, the following code

data = [[3,1,5],[3,3,5],[4,6,4],[3,10,5],[3,8,4]]
pca_all = PCA()
l  = pca_all.fit_transform(data)
print(pca_all.explained_variance_ratio_)

prints the following:

[0.96655579 0.02743557 0.00600865]

I assume this means that the dimension of the data which explains the most variation explains ~96.7% of that variation, the dimensions which explains the second most explains ~2.7%, etc. However, I want to return the percent variation explained in the same order as the dimensions in the data, like so:

[0.00600865 0.96655579 0.02743557] 

since in the data, the second entry of each row varies the most, the first varies the least, etc. How can I return the percent variance explained in this order?

kjakeb
  • 6,810
  • 5
  • 20
  • 34
  • 3
    Does this answer your question? [Finding the dimension with highest variance using scikit-learn PCA](https://stackoverflow.com/questions/15369006/finding-the-dimension-with-highest-variance-using-scikit-learn-pca) – Matt L. Feb 09 '20 at 23:16
  • It's not in the order of variation of data. It's in the order of how *transformed* data explains variation. – Sergey Bushmanov Feb 10 '20 at 08:21
  • @MattL. Yes that does, I had the same confusion. Thank you – kjakeb Feb 10 '20 at 16:05

0 Answers0