0

I have trouble understanding how to set the levels in the plot of a bivariate distribution in r. The documentation states that I can choose the levels by setting a

numeric vector of levels at which to draw contour lines

Now I would like the contour to show the limit containing 95% of the density or mass. But if, in the example below (adapted from here) I set the vector as a <- c(.95,.90) the code runs without error but the plot is not displayed. If instead, I set the vector as a <- c(.01,.05) the plot is displayed. But I am not sure I understand what the labels "0.01" and "0.05" mean with respect to the density.

library(mnormt)

x     <- seq(-5, 5, 0.25) 
y     <- seq(-5, 5, 0.25)
mu1    <- c(0, 0)
sigma1 <- matrix(c(2, -1, -1, 2), nrow = 2)
f     <- function(x, y) dmnorm(cbind(x, y), mu1, sigma1)
z     <- outer(x, y, f)
a <- c(.01,.05)
contour(x, y, z,  levels = a)

Emy
  • 817
  • 1
  • 8
  • 25

3 Answers3

2

But I am not sure I understand what the labels "0.01" and "0.05" mean with respect to the density.

It means the points where the density is equal 0.01 and 0.05. From help("contour"):

numeric vector of levels at which to draw contour lines.

So it is the function values at which to draw the lines (contours) where the function is equal to those levels (in this case the density). Take a simple example which may help is x + y:

y <- x <- seq(0, 1, length.out = 50)
z <- outer(x, y, `+`)
par(mar = c(5, 5, 1, 1))
contour(x, y, z, levels = c(0.5, 1, 1.5))

enter image description here

Now I would like the contour to show the limit containing 95% of the density or mass.

In your example, you can follow my answer here and draw the exact points:

# input
mu1    <- c(0, 0)
sigma1 <- matrix(c(2, -1, -1, 2), nrow = 2)

# we start from points on the unit circle
n_points <- 100
xy <- cbind(sin(seq(0, 2 * pi, length.out = n_points)), 
            cos(seq(0, 2 * pi, length.out = n_points)))

# then we scale the dimensions
ev <- eigen(sigma1)
xy[, 1] <- xy[, 1] * 1 
xy[, 2] <- xy[, 2] * sqrt(min(ev$values) / max(ev$values))

# then rotate
phi <- atan(ev$vectors[2, 1] / ev$vectors[1, 1])
R <- matrix(c(cos(phi), sin(phi), -sin(phi), cos(phi)), 2) 
xy <- tcrossprod(R, xy)

# find the right length. You can change .95 to which ever 
# quantile you want
chi_vals <- qchisq(.95, df = 2) * max(ev$values)
s <- sqrt(chi_vals)

par(mar = c(5, 5, 1, 1))
plot(s * xy[1, ] + mu1[1], s * xy[2, ] + mu1[2], lty = 1, 
     type = "l", xlab = "x", ylab = "y")

enter image description here

  • And we can, of course, draw other areas which contains 95% of the density but I presume that this is the area you want. – Benjamin Christoffersen Mar 15 '21 at 14:49
  • thanks a lot for your answer. Could you please refer me to some mathematical resources to understand your steps better? For example, why do you need the eigenvlaues to scale the ellipse? And how do you know and calculate how much to rotate? Why do you need to compute the tangent inverse with `atan`, then build `R` and finally calculate the cross product of the transpose with `tcrossprod`? – Emy Mar 15 '21 at 20:48
  • It is very much related to PCA. I have updated the answer in the other question. You need the eigenvectors as they give you the scale to use for each orthogonal direction and to compute the angle (this the `atan`) to rotate the plot from the first eigenvector. – Benjamin Christoffersen Mar 17 '21 at 05:30
0

The levels indicates where the lines are drawn, with respect to the specific 'z' value of the bivariate normal density. Since max(z) is 0.09188815, levels of a <- c(.95,.90) can't be drawn.

To draw the line delimiting 95% of the mass I used the ellipse() function as suggested in this post (second answer from the top).

library(mixtools)
library(mnormt)

x     <- seq(-5, 5, 0.25) 
y     <- seq(-5, 5, 0.25)
mu1    <- c(0, 0)
sigma1 <- matrix(c(2, -1, -1, 2), nrow = 2)
f     <- function(x, y) dmnorm(cbind(x, y), mu1, sigma1)
z     <- outer(x, y, f)
a <- c(.01,.05)
contour(x, y, z,  levels = a)
ellipse(mu=mu1, sigma=sigma1, alpha = .05, npoints = 250, col="red") 

Emy
  • 817
  • 1
  • 8
  • 25
0

I also found another solution in the book "Applied Multivariate Statistics with R" by Daniel Zelterman.

# Figure 6.5:  Bivariate confidence ellipse

library(datasets)
library(MASS)
library(MVA)
#> Loading required package: HSAUR2
#> Loading required package: tools

biv <- swiss[, 2 : 3]          #   Extract bivariate data

bivCI <- function(s, xbar, n, alpha, m)
  #  returns m (x,y) coordinates of 1-alpha joint confidence ellipse of mean
  
{
  x <- sin( 2* pi * (0 : (m - 1) )/ (m - 1))    # m points on a unit circle
  y <- cos( 2* pi * (0 : (m - 1)) / (m - 1))
  cv <-  qchisq(1 - alpha, 2)                   # chisquared critical value
  cv <- cv / n                                  # value of quadratic form
  for (i in 1 : m)
  {
    pair <- c(x[i], y[i])        # ith (x,y) pair
    q <- pair %*% solve(s, pair)  # quadratic form
    x[i] <- x[i] * sqrt(cv / q) + xbar[1]
    y[i] <- y[i] * sqrt(cv / q) + xbar[2]
  }
  return(cbind(x, y))
}

### pdf(file = "bivSwiss.pdf")

plot(biv, col = "red", pch = 16, cex.lab = 1.5)
lines(bivCI(var(biv), colMeans(biv), dim(biv)[1], .01, 1000), type = "l",
      col = "blue")
lines(bivCI(var(biv), colMeans(biv), dim(biv)[1], .05, 1000), 
      type = "l", col = "green", lwd = 1)
lines(colMeans(biv)[1], colMeans(biv)[2], pch = 3, cex = .8, type = "p", 
     lwd = 1)

Created on 2021-03-15 by the reprex package (v0.3.0)

Emy
  • 817
  • 1
  • 8
  • 25