PCA Analysis and Plotting with Python

Question

Possible Duplicate:
PCA Analysis with Python

I have this data which is 1940 x 4 in its dimensions. Its rows are readings or samples while its columns are variables (temp, humid, windspeed and pressure). I wanted to conduct a PCA Analysis and to plot its results using Python. The situation is, I have ended up with a few techniques and examples, but I am not sure how to utilize them and what to do even if I get the PCA results. So, here I am looking for a code example through which I can implement PCA Analysis in Python on this sort of dataset as well as to understand how to interpret the PCA results and how to plot them..finally, how to interpret the plots. Many Thanks.

http://stackoverflow.com/questions/1730600/principal-component-analysis-in-python — YXD, Dec 02 '12 at 12:52
Mr E's comment should help. Also it seems your [previous question](http://stackoverflow.com/questions/13224362/pca-analysis-with-python) got some pretty solid answers (and I'm not sure how this question is very much different.) — gary, Dec 02 '12 at 12:53
I have seen this example..all it says that this is how you can plot your PCA..I need to know how I should plot plus what to plot in PCA results and how to interpret them properly..:( — khan, Dec 02 '12 at 12:54
what do you want the PCA for, since your data is only 4 dimensional, do you still want to reduce the dimension? — Min Lin, Dec 02 '12 at 12:54
make sure to open data file in some plaintext editor, b/c in excel it looks scary. — khan, Dec 02 '12 at 12:55
Yes, I want it to be reduced to two dimensions..or max three so that I can plot and interpret all these. — khan, Dec 02 '12 at 12:56

score 0 · Answer 1 · answered Dec 02 '12 at 14:29

Principal component analysis is useful for reducing the dimensionality of a data set. Since your data contains only four variables and (as far as I know) they are not related, I would not expect PCA to be valuable for any practical analysis.

If I'm wrong, and you sincerely expect that some of the variables are related, you can use PCA to identify the most important 4-vectors that capture most of the typical covariance. These are the eigenvectors of the covariance matrix. It takes four such vectors to completely span the same space as the input variables.

In systems where large numbers of variables are being measured and there is a large degree of mutuual information, PCA identifies the important bits of independent information. I don't think this is the case for your system.

PCA Analysis and Plotting with Python

1 Answers1