3

How to plot bivariate normal distribution with expanding ellipses and add 5%, 25%, 50%, 75% and 95% label in the plot? Thank you!

enter image description here

stata
  • 545
  • 1
  • 8
  • 20
  • 3
    Take a look at http://www.stat.cmu.edu/~kass/KEB/RHTML/R/bivariateNormalPerspectives.r.html (code at the top, plots at the bottom). – NPE Sep 08 '14 at 06:23
  • Good,Thanks for NPE. – stata Sep 08 '14 at 07:24
  • Possible duplicate of http://stackoverflow.com/questions/7055848/plot-ellipse-bounding-a-percentage-of-points or http://stackoverflow.com/q/6655268/602276 – Andrie Sep 08 '14 at 09:01

2 Answers2

3

You can create a contour plot using an R package called mvtnorm.

Let's say you're trying to plot a bivariate normal distribution where mu_x = 1 and mu_y = 1 and variance matrix is c(2,1,1,1). Generate 100 observations for x,y,z. You can create a contour plot for this scenario as such:

library(mvtnorm)
x.points <- seq(-3,3,length.out=100)
y.points <- x.points
z <- matrix(0,nrow=100,ncol=100)
mu <- c(1,1)
sigma <- matrix(c(2,1,1,1),nrow=2)
for (i in 1:100) {
   for (j in 1:100) {
    z[i,j] <- dmvnorm(c(x.points[i],y.points[j]),
    mean=mu,sigma=sigma)
   }
}
contour(x.points,y.points,z)
topspinj
  • 66
  • 1
  • 3
3

Here is a solution that computes the contours at the levels that you want

#####
# Compute points and rotation matrix

# input
theta <- c(1, 2)
sigma <- diag(c(3^2, 2^2))
sigma[2, 1] <- sigma[1, 2] <- sqrt(sigma[1, 1] * sigma[2, 2]) * .5 

# we start from points on the unit circle
n_points <- 100
xy <- cbind(sin(seq(0, 2 * pi, length.out = n_points)), 
            cos(seq(0, 2 * pi, length.out = n_points)))

# then we scale the dimensions
ev <- eigen(sigma)
xy[, 1] <- xy[, 1] * 1 
xy[, 2] <- xy[, 2] * sqrt(min(ev$values) / max(ev$values))

# then rotate
phi <- atan(ev$vectors[2, 1] / ev$vectors[1, 1])
R <- matrix(c(cos(phi), sin(phi), -sin(phi), cos(phi)), 2) 
xy <- tcrossprod(R, xy)

# the quantiles you ask for
chi_vals <- qchisq(c(.05, .25, .50, .75, .95), df = 2) * max(ev$values)

#####
# Plot contours
par(mar = c(4.5, 4, .5, .5))
plot(c(-8, 10), c(-4, 8), type = "n", xlab = "x", ylab = "y")
for(r in sqrt(chi_vals))
  lines(r * xy[1, ] + theta[1], r * xy[2, ] + theta[2], lty = 1)

enter image description here

Detailed Explanation

theta <- c(1, 2)
sigma <- diag(c(3^2, 2^2))
sigma[2, 1] <- sigma[1, 2] <- sqrt(sigma[1, 1] * sigma[2, 2]) * .5 

# we start from points on the unit circle
n_points <- 100
xy <- cbind(sin(seq(0, 2 * pi, length.out = n_points)), 
            cos(seq(0, 2 * pi, length.out = n_points)))
par(mar = c(5, 5, 3, 1), mfcol = c(2, 2))
plot(xy[, 1], xy[, 2], xlab = "x", ylab = "y", xlim = c(-8, 10), bty = "l",
     ylim = c(-4, 8), main = "Unit circle", type = "l")
arrows(c(0, 0), c(0, 0), c(0, 1), c(1, 0), length = .05)

# this is very much like PCA. We scale the dimensions such that the first 
# dimension has length one and the others are scaled proportional to the square
# root of their eigenvalue relative to the largest eigenvalue
ev <- eigen(sigma)
scal <- sqrt(min(ev$values) / max(ev$values))
xy[, 2] <- xy[, 2] * scal
plot(xy[, 1], xy[, 2], xlab = "x", ylab = "y", xlim = c(-8, 10), bty = "l",
     ylim = c(-4, 8), main = "Scaled", type = "l")
arrows(c(0, 0), c(0, 0), c(0, 1), c(scal, 0), 
       length = .05)

# then we rotate phi degrees to account for the correlation coefficient. See 
#   https://en.wikipedia.org/wiki/Rotation_matrix
# and notice that we compute the angle of the first eigenvector  
phi <- atan(ev$vectors[2, 1] / ev$vectors[1, 1])
R <- matrix(c(cos(phi), sin(phi), -sin(phi), cos(phi)), 2) 
xy <- tcrossprod(R, xy) # R %*% t(xy)
plot(xy[1, ], xy[2, ], xlab = "x", ylab = "y", xlim = c(-8, 10), bty = "l",
     ylim = c(-4, 8), main = "Rotated", type = "l")
arrs <- tcrossprod(R, matrix(c(0, 1, scal, 0), 2L))
arrows(c(0, 0), c(0, 0), arrs[1, ], arrs[2, ], length = .05)

# the right size of each circle can now be found by taking the wanted 
# quantile from the chi-square distribution with two degrees of freedom 
# multiplied by the largest eigenvalue
plot(c(-8, 10), c(-4, 8), type = "n", xlab = "x", ylab = "y", main = "Final", 
     bty = "l")
chi_vals <- qchisq(c(.05, .25, .50, .75, .95), df = 2) * max(ev$values)
for(r in sqrt(chi_vals))
  lines(r * xy[1, ] + theta[1], r * xy[2, ] + theta[2], lty = 1)

enter image description here

  • @Benjamim, this looks like a great answer. I am trying to understand your code step by step. I see that you performed a series of matrix operations on the variance matrix - why did you have to do that? To rotate the plot so that it looked similar to the original? – Emy Mar 04 '21 at 12:02
  • Thanks. The matrix operations are needed to produce the contours for the distribution the OP shows. Thus, the matrix operation are needed. I take the contours from the bivariate standard normal distribution. Then I scale the axes to account for the unequal variance and finally rotate the plot to account for the correlation. – Benjamin Christoffersen Mar 06 '21 at 23:59
  • There is a more detailed answer now. I hope that it helps. – Benjamin Christoffersen Mar 17 '21 at 05:27