Questions tagged [pca]

Principal component analysis (PCA) is a statistical technique for dimension reduction often used in clustering or factor analysis. Given any number of explanatory or causal variables, PCA ranks the variables by their ability to explain greatest variation in the data. It is this property that allows PCA to be used for dimension reduction, i.e. to identify the most important variables from amongst a large set possible influences.

Overview

Mathematically, principal component analysis (PCA) amounts to an orthogonal transformation of possibly correlated variables (vectors) into uncorrelated variables called principal component vectors.

Tag usage

Questions on tag pca should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software r for statistical computing and graphics, functions princomp and prcomp compute PCA.

2728 questions

115

votes

11 answers

Principal component analysis in Python

I'd like to use principal component analysis (PCA) for dimensionality reduction. Does numpy or scipy already have it, or do I have to roll my own using numpy.linalg.eigh? I don't just want to use singular value decomposition (SVD) because my input…

python numpy scipy pca

asked Nov 13 '09 at 17:04

Vebjorn Ljosa

17,438
13
70
88

100

votes

5 answers

Recovering features names of explained_variance_ratio_ in PCA with sklearn

I'm trying to recover from a PCA done with scikit-learn, which features are selected as relevant. A classic example with IRIS dataset. import pandas as pd import pylab as pl from sklearn import datasets from sklearn.decomposition import PCA # load…

python machine-learning scikit-learn pca

asked Apr 10 '14 at 09:43

sereizam

2,048
3
20
29

votes

11 answers

Principal Component Analysis (PCA) in Python

I have a (26424 x 144) array and I want to perform PCA over it using Python. However, there is no particular place on the web that explains about how to achieve this task (There are some sites which just do PCA according to their own - there is no…

python scikit-learn pca

asked Nov 05 '12 at 00:10

khan

7,005
15
48
70

votes

3 answers

Feature/Variable importance after a PCA analysis

I have performed a PCA analysis over my original dataset and from the compressed dataset transformed by the PCA I have also selected the number of PC I want to keep (they explain almost the 94% of the variance). Now I am struggling with the…

python machine-learning scikit-learn pca feature-selection

asked Jun 11 '18 at 10:49

fbm

votes

2 answers

Principal components analysis using pandas dataframe

How can I calculate Principal Components Analysis from data in a pandas dataframe?

python pandas pca scientific-computing principal-components

asked Apr 25 '14 at 00:22

user3362813

votes

3 answers

Obtain eigen values and vectors from sklearn PCA

How I can get the the eigen values and eigen vectors of the PCA application? from sklearn.decomposition import PCA clf=PCA(0.98,whiten=True) #converse 98% variance X_train=clf.fit_transform(X_train) X_test=clf.transform(X_test) I can't find…

python scipy scikit-learn pca

asked Aug 09 '15 at 23:51

Abhishek Bhatia

9,404
26
87
142

votes

11 answers

raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge in matplotlib pca determination

Code: import numpy from matplotlib.mlab import PCA file_name = "store1_pca_matrix.txt" ori_data = numpy.loadtxt(file_name,dtype='float', comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False,…

python matplotlib pca

asked Feb 17 '14 at 11:16

user 3317704

votes

3 answers

Python scikit learn pca.explained_variance_ratio_ cutoff

When choosing the number of principal components (k), we choose k to be the smallest value so that for example, 99% of variance, is retained. However, in the Python Scikit learn, I am not 100% sure pca.explained_variance_ratio_ = 0.99 is equal to…

python scikit-learn pca

asked Sep 30 '15 at 03:00

Chubaka

2,933
7
43
58

votes

5 answers

Selecting multiple odd or even columns/rows for dataframe

Is there a way in R to select many non-consecutive i.e. odd or even rows/columns? I'm plotting the loadings for my Principal Components Analysis. I have 84 rows of data ordered like this: x_1 y_1 x_2..... x_42 y_42 And at the moment I am creating…

r dataframe pca

asked Jun 26 '14 at 21:08

dmt

2,113
3
24
23

votes

8 answers

Plotting pca biplot with ggplot2

I wonder if it is possible to plot pca biplot results with ggplot2. Suppose if I want to display the following biplot results with ggplot2 fit <- princomp(USArrests, cor=TRUE) summary(fit) biplot(fit) Any help will be highly appreciated. Thanks

r graphics ggplot2 pca

asked Jul 05 '11 at 05:44

MYaseen208

22,666
37
165
309

votes

2 answers

PCA on sklearn - how to interpret pca.components_

I ran PCA on a data frame with 10 features using this simple code: pca = PCA() fit = pca.fit(dfPca) The result of pca.explained_variance_ratio_ shows: array([ 5.01173322e-01, 2.98421951e-01, 1.00968655e-01, 4.28813755e-02, …

python machine-learning math scikit-learn pca

asked Nov 18 '17 at 20:39

Diego

34,802
21
91
134

votes

2 answers

R function prcomp fails with NA's values even though NA's are allowed

I am using the function prcomp to calculate the first two principal components. However, my data has some NA values and therefore the function throws an error. The na.action defined seems not to work even though it is mentioned in the help file…

r pca na

asked Aug 22 '12 at 17:21

user969113

2,349
10
44
51

votes

2 answers

PCA projection and reconstruction in scikit-learn

I can perform PCA in scikit by code below: X_train has 279180 rows and 104 columns. from sklearn.decomposition import PCA pca = PCA(n_components=30) X_train_pca = pca.fit_transform(X_train) Now, when I want to project the eigenvectors onto feature…

python machine-learning scikit-learn pca

asked Apr 12 '16 at 07:48

HonzaB

7,065
6
31
42

votes

3 answers

Factor Loadings using sklearn

I want the correlations between individual variables and principal components in python. I am using PCA in sklearn. I don't understand how can I achieve the loading matrix after I have decomposed my data? My code is here. iris = load_iris() data, y…

python scikit-learn pca

asked Jan 19 '14 at 14:03

Riyaz

1,430
2
17
27

votes

1 answer

R internal handling of sparse matrices

I have been comparing the performance of several PCA implementations from both Python and R, and noticed an interesting behavior: While it seems impossible to compute the PCA of a sparse matrix in Python (the only approach would be scikit-learn's…

python r scikit-learn sparse-matrix pca

asked Jun 14 '18 at 08:37

dennlinger

9,890
1
42
63

2 3

…

99 100 Next