Plot a Correlation Circle in Python

Question

I've been doing some Geometrical Data Analysis (GDA) such as Principal Component Analysis (PCA). I'm looking to plot a Correlation Circle... these look a bit like this:

Basically, it allows to measure to which extend the Eigenvalue / Eigenvector of a variable is correlated to the principal components (dimensions) of a dataset.

Anyone knows if there is a python package that plots such data visualization?

Possible duplicate of [PCA Scaling with ggbiplot](http://stackoverflow.com/questions/18039313/pca-scaling-with-ggbiplot) — BadZen, Jun 14 '16 at 15:22
Actually it's not the same, here I'm trying to use Python not R — testing, Jun 15 '16 at 09:25
Yes the PCA circle is possible using the mlextend package. http://rasbt.github.io/mlxtend/user_guide/plotting/plot_pca_correlation_graph/ — parth prasoon, Jul 12 '20 at 08:42

score 6 · Answer 1 · edited Oct 22 '20 at 12:54

Here is a simple example using sklearn and the iris dataset. Includes both the factor map for the first two dimensions and a scree plot:

from sklearn.decomposition import PCA
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
 
df = sns.load_dataset('iris')
 
n_components = 4
 
# Do the PCA.
pca = PCA(n_components=n_components)
reduced = pca.fit_transform(df[['sepal_length', 'sepal_width',
                                'petal_length', 'petal_width']])

# Append the principle components for each entry to the dataframe
for i in range(0, n_components):
    df['PC' + str(i + 1)] = reduced[:, i]

display(df.head())

# Do a scree plot
ind = np.arange(0, n_components)
(fig, ax) = plt.subplots(figsize=(8, 6))
sns.pointplot(x=ind, y=pca.explained_variance_ratio_)
ax.set_title('Scree plot')
ax.set_xticks(ind)
ax.set_xticklabels(ind)
ax.set_xlabel('Component Number')
ax.set_ylabel('Explained Variance')
plt.show()

# Show the points in terms of the first two PCs
g = sns.lmplot('PC1',
               'PC2',
               hue='species',data=df,
               fit_reg=False,
               scatter=True,
               size=7)

plt.show()

# Plot a variable factor map for the first two dimensions.
(fig, ax) = plt.subplots(figsize=(8, 8))
for i in range(0, pca.components_.shape[1]):
    ax.arrow(0,
             0,  # Start the arrow at the origin
             pca.components_[0, i],  #0 for PC1
             pca.components_[1, i],  #1 for PC2
             head_width=0.1,
             head_length=0.1)

    plt.text(pca.components_[0, i] + 0.05,
             pca.components_[1, i] + 0.05,
             df.columns.values[i])


an = np.linspace(0, 2 * np.pi, 100)
plt.plot(np.cos(an), np.sin(an))  # Add a unit circle for scale
plt.axis('equal')
ax.set_title('Variable factor map')
plt.show()

It'd be a good exercise to extend this to further PCs, to deal with scaling if all components are small, and to avoid plotting factors with minimal contributions.

Thanks for this - one change, the loop for plotting the variable factor map should be over the number of features, not the number of components. Instead of range(0, len(pca.components_)), it should be range(pca.components_.shape[1]). — limi44, May 07 '19 at 15:17

score 4 · Answer 2 · answered Jun 16 '16 at 08:28

4

I agree it's a pity not to have it in some mainstream package such as sklearn.

Here is a home-made implementation: https://github.com/mazieres/analysis/blob/master/analysis.py#L19-34

answered Jun 16 '16 at 08:28

Christophe Priieur

343
3
8

Yeah, this would fit perfectly in mlxtend. Why not submitting a PR Christophe? – Manuel G Apr 30 '18 at 09:57

Plot a Correlation Circle in Python

2 Answers2

Linked