2

The question seems simple enough. I can plot 95% credible/confidence ellipses in ggplot using stat_ellipse. I want to find out how to test whether a given point is within that ellipse.

For example, here is a dataset of points and their 95% ellipses, plus a set of randomvalues. I'd like a function that calculates TRUE or FALSE whether the randompalues falls within the ellipse.

library(tidyverse)

set.seed(420)

randomvalues <- data.frame(x = sample(min(mtcars$hp):max(mtcars$hp), 10),
                           y = sample(min(mtcars$disp):max(mtcars$disp), 10))

ggplot(data = mtcars)+
  geom_point(aes(x = hp, y = disp, color = as.factor(cyl)))+
  stat_ellipse(aes(x = hp, y = disp, color = as.factor(cyl)))+
  geom_point(data = randomvalues, aes(x = x, y = y))

This (very contrived) solution comes close, but when points are close to the border of an ellipse, it fails. The test point here is 310,300, which is just outside of ellipse 8. This function says it falls within ellipse 8.

point_in_ellipse <- function(x, y, points, level = 0.95) {
  
  # Calculate mean of points
  mean_x <- mean(points[, 1])
  mean_y <- mean(points[, 2])
  
  # Calculate covariance matrix of points
  cov_mat <- cov(points)
  
  # Calculate eigenvalues and eigenvectors of covariance matrix
  eigen <- eigen(cov_mat)
  
  # Calculate semi-major and semi-minor axes of ellipse
  lambda_1 <- sqrt(eigen$values[1] * qchisq(level, df = 2))
  lambda_2 <- sqrt(eigen$values[2] * qchisq(level, df = 2))
  
  # Calculate orientation of ellipse
  theta <- atan2(eigen$vectors[2, 1], eigen$vectors[1, 1])
  
  # Convert point to ellipse coordinates
  dx <- x - mean_x
  dy <- y - mean_y
  xp <- dx * cos(theta) + dy * sin(theta)
  yp <- -dx * sin(theta) + dy * cos(theta)
  
  # Calculate distance from point to ellipse center
  d <- sqrt((xp / lambda_1)^2 + (yp / lambda_2)^2)
  
  # Check if point is within ellipse
  if (d <= 1) {
    return(TRUE)
  } else {
    return(FALSE)
  }
}

# Split mtcars by cylinder
ellipse.points <- mtcars %>%
  select(hp, disp, cyl) %>%
  split(f = .$cyl)

# Add a randomvalue that is on the border of an ellipse
randomvalues <- rbind(randomvalues, c(310, 300))

# Make a list to receive the answers
answer <- list()

# For each cylinder, calculate the ellipse, and calcualte whether the randomvalues fall inside it
for(i in names(ellipse.points)){
  answer[[i]] <- apply(randomvalues, 1, function(x) point_in_ellipse(x = x[1], y = x[2], ellipse.points[[i]]) )
}

# Convert back to df
final.df <- cbind(randomvalues, do.call(cbind, answer))

I think stat_ellipse must not use the chi squared distribution or the covariance matrix, or must use another way of calculating its ellipses.

Any ideas?

EDIT: 8 May 2022 Based on suggestions from @tjebo, I tried to calculate the ellipses with car::dataEllipse(), then convert the ellipses to sf objects, then use st_intersects to find which points intersect which ellipses.

# Calculate ellipses for each cyl using car::dataEllipse
list.of.cyls <- split(mtcars[,c(4,3)], f = as.factor(mtcars$cyl))

cyl.ellipses <- lapply(list.of.cyls, function(x) car::dataEllipse(x = x$hp, y = x$disp, draw = FALSE) %>%
                           .$`0.95` %>%
                           as_tibble(.) %>%
                           st_as_sf(coords = c("x", "y")) %>%
                           summarise(geometry = st_combine(geometry)) %>%
                           st_cast("POLYGON")
)

# Convert randomvalues to sf
randomvalues.sf <- randomvalues %>%
  st_as_sf(coords = c("x", "y"))

# Already the ellipses are off
ggplot()+
  geom_sf(data = do.call(bind_rows, cyl.ellipses), fill = NA)+
  geom_point(data = randomvalues, aes(x = x, y = y))

ggplot()+
  stat_ellipse(data = mtcars, aes(x = hp, y = disp, color = as.factor(cyl)))+
  geom_point(data = randomvalues, aes(x = x, y = y))

Before we do any intersection calculations, the ellipses are already off. Not sure how my calculation of ellipses using dataEllipses varies from stat_ellipse, but somehow it does.

Anyone have an idea?

Alex Krohn
  • 71
  • 5
  • The documentation for `stat_ellipse` says it is based on `car::dataEllipse`, so perhaps you could use that package to work out what is going on – Andrew Gustar May 04 '23 at 17:29
  • 1
    I guess the way to go is to convert the ellipse to a spatial object and then use something like `sp::point.in.polygon()`. I'm pretty sure I've seen some similar threads but on a quick search I struggle to find those. They are here somewhere :) – tjebo May 04 '23 at 18:11
  • e.g. https://stackoverflow.com/questions/21971447/check-if-point-is-in-spatial-object-which-consists-of-multiple-polygons-holes – tjebo May 04 '23 at 18:12
  • @tjebo This is a good solution, but without knowing precisely how dataEllipse create the ellipses from points, I cannot convert the ellipse to a polygon. Do you have a suggestion about how to generate the ellipse polygons from a set of x, y coordinates? – Alex Krohn May 08 '23 at 17:17
  • 1
    Hey Alex, I'm afraid I won't have time to dig into this problem myself too much at this point in time... sorry! – tjebo May 08 '23 at 18:44

1 Answers1

1

short answer: car::dataEllipse is using an F-statistic cutoff (analogous to a Student t-distribution cutoff, taking the uncertainty of the variance estimate into account). In contrast, you're using a cutoff based on a chi-squared distribution, analogous to Normal cutoffs, assuming the variance is known exactly (i.e., the sampling variance is equal to the true variance).

From that function:

radius <- sqrt(dfn * qf(level, dfn, dfd))
result[[i]] <- ellipse(center, shape, radius, ...)

where shape is the covariance matrix.

car:::ellipse uses

angles <- (0:segments) * 2 * pi/segments
unit.circle <- cbind(cos(angles), sin(angles))
Q <- chol(shape, pivot = TRUE)
order <- order(attr(Q, "pivot"))
ellipse <- t(center + radius * t(unit.circle %*% Q[, order]))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Right. It seems like it's more complicated than just switching in `qf` for `qchisq`. I'm not really sure what the two degrees of freedom would be for the `qf` function anyway. Thanks for your help. – Alex Krohn May 04 '23 at 20:15