7

In ggplot2, after I drawing the ellipse plot using stat_ellipse, is there any way to calculate the area of this ellipse? Here is the code and the plot:

library(ggplot2)
set.seed(1234)
x <- rnorm (1:1000)
y <- rnorm (1:1000)
data <- cbind(x, y)
data <- as.data.frame(data)
ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse()

enter image description here

RHertel
  • 23,412
  • 5
  • 38
  • 64
Ping Tang
  • 415
  • 1
  • 9
  • 20

2 Answers2

11

You can calculate the area of the ellipse by finding its semi-major and semi-minor axes (as shown in this SO answer):

# Plot object
p = ggplot (data, aes (x = x, y = y))+
  geom_point()+
  stat_ellipse(segments=201) # Default is 51. We use a finer grid for more accurate area.

# Get ellipse coordinates from plot
pb = ggplot_build(p)
el = pb$data[[2]][c("x","y")]

# Center of ellipse
ctr = MASS::cov.trob(el)$center  # Per @Roland's comment

# Calculate distance to center from each point on the ellipse
dist2center <- sqrt(rowSums((t(t(el)-ctr))^2))

# Calculate area of ellipse from semi-major and semi-minor axes. 
# These are, respectively, the largest and smallest values of dist2center. 
pi*min(dist2center)*max(dist2center)

[1] 13.82067
Community
  • 1
  • 1
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • 5
    You can calculate the center more precisely with `MASS::cov.trob(cbind(x, y))$center`, which is what `stat_ellipse` [uses by default](https://github.com/hadley/ggplot2/blob/master/R/stat-ellipse.R#L96). – Roland Aug 05 '16 at 06:56
  • after pondering on [this very similar question](https://stackoverflow.com/q/60423973/7941188), I think the answer may not be quite correct - currently you're calculating the center based on the calculated ellipse data points - not based on the data, which @Roland was pointing to. I guess it should be rather either `MASS::cov.trob(pb$data[[1]][c("x","y")])$center` or indeed `MASS::cov.trob(data)$center` – tjebo Feb 27 '20 at 10:54
0

The area can be directly calculated from the covariance matrix by calculating the eigenvalues first.

You need to scale the variances / eigenvalues by the factor of confidence that you want to get.

This thread is very helpful

set.seed(1234)
dat <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

cov_dat <- cov(dat) # covariance matrix

eig_dat <- eigen(cov(dat))$values #eigenvalues of covariance matrix

vec <- sqrt(5.991* eig_dat) # half the length of major and minor axis for the 95% confidence ellipse

pi * vec[1] * vec[2]  
#> [1] 18.38858

Created on 2020-02-27 by the reprex package (v0.3.0)

In this particular case, the covariances are zero, and the eigenvalues will be more or less the variance of the variables. So you can use just the variance for your calculation. - given that both are normally distributed.

set.seed(1234)
data <- data.frame(x = rnorm(1:1000), y = rnorm(1:1000))

pi * 5.991 * sd(data$x) * sd(data$y) # factor for 95% confidence = 5.991
#> [1] 18.41814

Created on 2020-02-27 by the reprex package (v0.3.0)

The calculated value is different from user eipi10's answer. This is likely due to the different calculation under the hood, with different assumptions on the underlying distribution. see this thread.

tjebo
  • 21,977
  • 7
  • 58
  • 94