How can I plot a correlation matrix as a set of ellipses, similar to the R open-air package?

Question

The figure below is plotted using the open-air R package:

I know matplotlib has the plt.matshow function,
but it can't clearly show the relation between variables at the same time.

Here is my early work：

df is a pandas dataframe with 7 variables shows like below:

I don't know how to attach a .csv file to StackOverflow.

Using plt.matshow(df.corr(),cmap = plt.cm.Greens), the figure shows like this:

The second figure can't represent the correlation relations of the variables as clearly as the first one.

Edit:

I upload the csv file to Google docs here.

Please don't post screenshots of your dataset - I can't copy/paste from an image. Paste the actual values into your question as text. — ali_m, Jan 01 '16 at 15:33
What do you mean by representing the correlation relations? Do you mean the correlation coefficient values? If so, please take a look at seaborn's annotated heatmap https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.heatmap.html — ayhan, Jan 01 '16 at 16:46
[Here's a related answer that uses the R `corrplot` package](http://stackoverflow.com/a/5453471/1461210) — ali_m, Jan 02 '16 at 01:43

ali_m · Accepted Answer · 2016-01-01T18:24:29.110

I'm not aware of any existing Python library that does these "ellipse plots", but it's not particularly hard to implement using a matplotlib.collections.EllipseCollection:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.collections import EllipseCollection

def plot_corr_ellipses(data, ax=None, **kwargs):

    M = np.array(data)
    if not M.ndim == 2:
        raise ValueError('data must be a 2D array')
    if ax is None:
        fig, ax = plt.subplots(1, 1, subplot_kw={'aspect':'equal'})
        ax.set_xlim(-0.5, M.shape[1] - 0.5)
        ax.set_ylim(-0.5, M.shape[0] - 0.5)

    # xy locations of each ellipse center
    xy = np.indices(M.shape)[::-1].reshape(2, -1).T

    # set the relative sizes of the major/minor axes according to the strength of
    # the positive/negative correlation
    w = np.ones_like(M).ravel()
    h = 1 - np.abs(M).ravel()
    a = 45 * np.sign(M).ravel()

    ec = EllipseCollection(widths=w, heights=h, angles=a, units='x', offsets=xy,
                           transOffset=ax.transData, array=M.ravel(), **kwargs)
    ax.add_collection(ec)

    # if data is a DataFrame, use the row/column names as tick labels
    if isinstance(data, pd.DataFrame):
        ax.set_xticks(np.arange(M.shape[1]))
        ax.set_xticklabels(data.columns, rotation=90)
        ax.set_yticks(np.arange(M.shape[0]))
        ax.set_yticklabels(data.index)

    return ec

For example, using your data:

data = df.corr()
fig, ax = plt.subplots(1, 1)
m = plot_corr_ellipses(data, ax=ax, cmap='Greens')
cb = fig.colorbar(m)
cb.set_label('Correlation coefficient')
ax.margins(0.1)

Negative correlations can be plotted as ellipses with the opposite orientation:

fig2, ax2 = plt.subplots(1, 1)
data2 = np.linspace(-1, 1, 9).reshape(3, 3)
m2 = plot_corr_ellipses(data2, ax=ax2, cmap='seismic', clim=[-1, 1])
cb2 = fig2.colorbar(m2)
ax2.margins(0.3)

Stefan · Answer 2 · 2016-01-01T17:28:10.493

Assuming you are interested in showing cluster relations, the seaborn package mentioned in the comments also has a clustermap. Using your correlation matrix (looks like you want to show correlation coefficients as int in the [-100, 100] range, you could do the following:

corr = df.corr().mul(100).astype(int)

     GX   HG   RM   SJ   XB   XN   ZG
GX  100   77   62   71   48   66   57
HG   77  100   69   74   61   61   58
RM   62   69  100   75   48   64   68
SJ   71   74   75  100   50   70   65
XB   48   61   48   50  100   46   51
XN   66   61   64   70   46  100   75
ZG   57   58   68   65   51   75  100

and then use seaborn.clustermap() as follows:

import seaborn as sns
sns.clustermap(data=corr, annot=True, fmt='d', cmap='Greens').savefig('cluster.png')

Mengshan · Answer 3 · 2017-07-28T15:47:52.910

I just discovered this Python package biokit today. It provides a very handy function to create various kinds of correlation charts. For example:

In [1]: import pandas as pd

In [2]: import matplotlib.pyplot as plt
   ...: from biokit.viz import corrplot

In [6]: corr
Out[6]: 
      GX    HG    RM    SJ    XB    XN    ZG
GX  1.00 -0.77  0.62  0.71  0.48  0.66  0.57
HG -0.77  1.00  0.69  0.74  0.61  0.61  0.58
RM  0.62  0.69  1.00  0.75  0.48  0.64  0.68
SJ  0.71  0.74  0.75  1.00  0.50  0.70  0.65
XB  0.48  0.61  0.48  0.50  1.00 -0.46  0.51
XN  0.66  0.61  0.64  0.70 -0.46  1.00  0.75
ZG  0.57  0.58  0.68  0.65  0.51  0.75  1.00

I took Stefan's data and modified it a little bit. Let's assume this is a correlation matrix. Now to create a correlation chart, you can simply do this:

In [7]: c = corrplot.Corrplot(corr)
   ...: c.plot()

Correlation chart with ellipses

You can read more examples here.

How can I plot a correlation matrix as a set of ellipses, similar to the R open-air package?

Edit:

3 Answers3