25

How do I create a confidence ellipse in a scatterplot using matplotlib?

The following code works until creating scatter plot. Then, is anyone familiar with putting confidence ellipses over the scatter plot?

import numpy as np
import matplotlib.pyplot as plt
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]

plt.scatter(x,y)
plt.show()

Following is the reference for Confidence Ellipses from SAS.

http://support.sas.com/documentation/cdl/en/grstatproc/62603/HTML/default/viewer.htm#a003160800.htm

The code in sas is like this:

proc sgscatter data=sashelp.iris(where=(species="Versicolor"));
  title "Versicolor Length and Width";
  compare y=(sepalwidth petalwidth)
          x=(sepallength petallength)
          / reg ellipse=(type=mean) spacing=4;
run;
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
2964502
  • 4,301
  • 12
  • 35
  • 55
  • 1
    possible duplicate of [multidimensional confidence intervals](http://stackoverflow.com/questions/12301071/multidimensional-confidence-intervals) – Saullo G. P. Castro Nov 21 '13 at 16:35
  • 2
    @ Saullo Castro have you seen the code in sas and do you think that the method implemented in sas and in the link you provided the same? – 2964502 Nov 21 '13 at 16:42
  • 2
    @tester3 - In the example you linked to, the confidence ellipse shown is for the mean, as opposed to for another sample drawn from the same population. (This is what `type=mean` is specifying.) My answer that @SaulloCastro linked to shows a confidence ellipse for the entire population (in other words, the area that another sample from the population should fall inside, identical to `type=predicted` in SAS). Jamie's answer uses this method as well. – Joe Kington Nov 21 '13 at 23:35

4 Answers4

31

The following code draws a one, two, and three standard deviation sized ellipses:

x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
cov = np.cov(x, y)
lambda_, v = np.linalg.eig(cov)
lambda_ = np.sqrt(lambda_)
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
ax = plt.subplot(111, aspect='equal')
for j in xrange(1, 4):
    ell = Ellipse(xy=(np.mean(x), np.mean(y)),
                  width=lambda_[0]*j*2, height=lambda_[1]*j*2,
                  angle=np.rad2deg(np.arccos(v[0, 0])))
    ell.set_facecolor('none')
    ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

enter image description here

Jaime
  • 65,696
  • 17
  • 124
  • 159
  • 1
    @Jamie - +1 Shouldn't the ellipses be twice as wide and high, though? Currently, they're N-sigma wide and high, as opposed to showing the region within N-sigma of the mean. – Joe Kington Nov 21 '13 at 19:18
  • @JoeKington Yes, I do think you are absolutely right, matplotlib makes it kind of clear they are the width and height, not the semi-width and semi-height... Have edited the code and image. Thanks! – Jaime Nov 21 '13 at 19:55
  • 1
    Better check @Ben 's answer below, because this code doesn't compute the angle properly. It appeared about 90 degrees flipped in my case. – grasshopper May 11 '16 at 15:20
  • 4
    Can confirm that this example computes the angle incorrectly; the correct angle is computed as `angle= np.rad2deg(np.arctan2(*v[:,0][::-1]))` – nathan lachenmyer Jan 28 '20 at 19:24
  • Yes the proper angle definition should be ```np.degrees(np.arctan2(*v[:,0][::-1]))``` – lhoupert Jul 10 '20 at 12:39
  • Shouldn't the width and height be mean_axis + 2 * std_axis? I can assume you subtracted the mean to center data in 0? – 3nomis Nov 30 '21 at 16:29
29

After giving the accepted answer a go, I found that it doesn't choose the quadrant correctly when calculating theta, as it relies on np.arccos:

oops

Taking a look at the 'possible duplicate' and Joe Kington's solution on github, I watered his code down to this:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def eigsorted(cov):
    vals, vecs = np.linalg.eigh(cov)
    order = vals.argsort()[::-1]
    return vals[order], vecs[:,order]

x = [5,7,11,15,16,17,18]
y = [25, 18, 17, 9, 8, 5, 8]

nstd = 2
ax = plt.subplot(111)

cov = np.cov(x, y)
vals, vecs = eigsorted(cov)
theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
w, h = 2 * nstd * np.sqrt(vals)
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
              width=w, height=h,
              angle=theta, color='black')
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

neg slope

Community
  • 1
  • 1
Ben
  • 433
  • 4
  • 6
  • 2
    Does anyone know how to generalise this to 3D (or possibly n dimensions)? – HansSnah Aug 31 '18 at 19:50
  • @Ben Why `w, h = 2 * nstd * np.sqrt(vals)`? That's `4 * np.sqrt(vals)`. – tauran Feb 20 '19 at 17:12
  • why it doesn't work when I change plt.scatter(x, y) to plt.scatter(np.mean(x), np.mean(y)). I want the ellipsoid around the mean – Farshid Rayhan May 28 '20 at 11:58
  • I tried the example and got a blank figure. Does anyone know why? – vicemagui Oct 28 '21 at 22:31
  • @tauran: the radii of the ellipse are sqrt(eigenvalue1) and sqrt(eigenvalue2). vals contains both eigenvalues: [eigenvalue1, eigenvalue2]. When drawing an ellipse you need to specify the width and not the radii of the ellipse: for this 2*sqrt(vals). And now there are many ellipses, but if you want to display the confidence area, such that 95% of the data points are inside that ellipse area, we need to display 2 standard deviations. For this, the multiplication with nstd (=2). https://en.wikipedia.org/wiki/Normal_distribution#Standard_deviation_and_coverage & https://cookierobotics.com/007/ – Jürgen Brauer May 18 '22 at 09:24
2

There is no need to compute angles explicitly once you have the eigendecomposition of your covariance matrix: the rotation portion already encodes that information for you for free:

cov = np.cov(x, y)
val, rot = np.linalg.eig(cov)
val = np.sqrt(val)
center = np.mean([x, y], axis=1)[:, None]

t = np.linspace(0, 2.0 * np.pi, 1000)
xy = np.stack((np.cos(t), np.sin(t)), axis=-1)

plt.scatter(x, y)
plt.plot(*(rot @ (val * xy).T + center))

enter image description here

You can expand your ellipse by applying a scale before translation:

plt.plot(*(2 * rot @ (val * xy).T + center))

enter image description here

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1

In addition to the accepted answer: I think the correct angle should be:

angle=np.rad2deg(np.arctan2(*v[:,np.argmax(abs(lambda_))][::-1]))) 

and the corresponding width (larger eigenvalue) and height should be:

width=lambda_[np.argmax(abs(lambda_))]*j*2, height=lambda_[1-np.argmax(abs(lambda_))]*j*2

As we need to find the corresponding eigenvector for the largest eigenvalue. Since "the eigenvalues are not necessarily ordered" according to the specs https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html and v[:,i] is the eigenvector corresponding to the eigenvalue lambda_[i]; we should find the correct column of the eigenvector by np.argmax(abs(lambda_)).

Kerem
  • 11
  • 3