5

I'm trying to understand how Principal Component Analysis works and I am testing it on the sklearn.datasets.load_iris dataset. I understand how each step works (e.g. standardize the data, covariance, eigendecomposition, sort for highest eigenvalue, transform original data to new axis using K selected dimensions).

The next step is to visualize where these eigenvectors are being projected into on the dataset (on the PC1 vs. PC2 plot, right?).

Can someone explain how to plot [PC1, PC2, PC3] eigenvectors on a 3D plot of the reduced dimension dataset?

Also, am I plotting this 2D version correctly? I'm not sure why my first eigenvector has a shorter length. Should I multiply by the eigenvalue?


Here is some of the research I have done to accomplish this:

The PCA method that I'm following is from : https://plot.ly/ipython-notebooks/principal-component-analysis/#Shortcut---PCA-in-scikit-learn (although I don't want to use plotly. I want to stick with pandas, numpy, sklearn, matplotlib, scipy, and seaborn)

I've been following this tutorial for plotting eigenvectors it seems pretty simple: Basic example for PCA with matplotlib but I can't seem to replicate the results with my data.

I found this but it seems overly complicated for what I'm trying to do and I don't want to have to create a FancyArrowPatch: plotting the eigenvector of covariance matrix using matplotlib and np.linalg


I've tried to make my code as straightforward as possible to follow the other tutorials:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})

%matplotlib inline
np.random.seed(0)

# Iris dataset
DF_data = pd.DataFrame(load_iris().data, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
                       columns = load_iris().feature_names)

Se_targets = pd.Series(load_iris().target, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])], 
                       name = "Species")

# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data), 
                           index = DF_data.index,
                           columns = DF_data.columns)

# Sklearn for Principal Componenet Analysis

# Dims
m = DF_standard.shape[1]
K = 2

# PCA (How I tend to set it up)
M_PCA = decomposition.PCA(n_components=m)
DF_PCA = pd.DataFrame(M_PCA.fit_transform(DF_standard), 
                columns=["PC%d" % k for k in range(1,m + 1)]).iloc[:,:K]


# Plot the eigenvectors
#https://stackoverflow.com/questions/18299523/basic-example-for-pca-with-matplotlib

# This is where stuff gets weird...
data = DF_standard

mu = data.mean(axis=0)
eigenvectors, eigenvalues = M_PCA.components_, M_PCA.explained_variance_ #eigenvectors, eigenvalues, V = np.linalg.svd(data.T, full_matrices=False)
projected_data = DF_PCA #np.dot(data, eigenvectors)

sigma = projected_data.std(axis=0).mean()

fig, ax = plt.subplots(figsize=(10,10))
ax.scatter(projected_data["PC1"], projected_data["PC2"])
for axis, color in zip(eigenvectors[:K], ["red","green"]):
#     start, end = mu, mu + sigma * axis ### leads to "ValueError: too many values to unpack (expected 2)"

    # So I tried this but I don't think it's correct
    start, end = (mu)[:K], (mu + sigma * axis)[:K] 
    ax.annotate('', xy=end,xytext=start, arrowprops=dict(facecolor=color, width=1.0))
    
ax.set_aspect('equal')
plt.show()

enter image description here

Community
  • 1
  • 1
O.rka
  • 29,847
  • 68
  • 194
  • 309

1 Answers1

0

I think you are asking the wrong question. The eigenvectors ARE the the principal components (PC1, PC2, etc.). So plotting the eigenvectors in the [PC1, PC2, PC3] 3D plot is simply plotting the three orthogonal axes of that plot.

You probably want to visualize how the eigenvectors look in your original coordinate system. This is what is discussed in your second link: Basic example for PCA with matplotlib.

Community
  • 1
  • 1
amhr
  • 41
  • 5
  • yes that's what I meant, to plot them on the original axis after it's been reduced in dimensions . I think that example you posted is the one I was using . I could get it to work correctly for my data – O.rka Jun 22 '16 at 22:59