plot Ellipse bounding a percentage of points

Question

I am using ellipsoidhull() function to derive an ellipse that bounds all the points in x,y coordinates. I then use point.in.polygon() function to predict if a new set of X,Y co-ordinates fall inside/outside the ellipse.

Instead of plotting an ellipse that bounds all the points in (x,y), is it possible to use say 80% of the points? The 80% of the points may be chosen to form the most compact or smallest elliptical area.

> xy

x       y 
3.076   5.208
3.046   5.123
2.993   5.108
3.062   5.134
3.168   5.223
3.138   5.284
3.166   5.319
3.226   5.411
3.262   5.417
3.215   5.234
3.086   5.019
3.199   5.167
3.274   5.596
3.293   5.608
3.195   5.396
3.294   5.374
2.974   5.539
3.268   5.377
3.192   5.298
3.08    4.916
3.117   4.985
3.128   5.118
3.21    5.373
3.184   5.282
3.27    5.291
3.074   5.175

> Query
X       Y
3.03    5.008
2.99    5.018
2.987   4.944
2.994   4.899
2.911   4.963
2.913   4.942
2.966   4.969
3.079   5.011
3.096   5.268
2.992   5.169
3.205   5.466
3.257   5.776
3.154   5.563
3.16    5.192
3.12    5.446
3.271   5.719
3.154   5.478
3.143   5.454
3.123   5.439
3.075   5.224
3.264   5.56
3.288   5.404
3.237   5.499
3.207   5.47
3.207   5.459
3.11    5.23
3.301   5.605
3.139   4.823


library(cluster)
exy <- ellipsoidhull(as.matrix(xy))
ellipse <- predict(exy)
library("sp")
point.in.polygon(Query$X, Query$Y, ellipse.FAM[,1], ellipse.FAM[,2])

Yes it is, as explained in this exact duplicate: http://stackoverflow.com/q/6655268/602276 — Andrie, Aug 14 '11 at 08:54
@Andrie It is almost the same except that I want to get the x & y coordinates of the ellipse when using dataEllipse instead of plotting them? Is there a function to do this? — user645600, Aug 14 '11 at 09:07
Does this answer your question? [Ellipse containing percentage of given points in R](https://stackoverflow.com/questions/6655268/ellipse-containing-percentage-of-given-points-in-r) — Richard, Dec 04 '21 at 06:52

IRTFM · Accepted Answer · 2014-09-08T16:13:09.610

Presumably you were using cluster::ellipsoidhull. In a different package the car::dataEllipse function calculates a center, shape and radius value and passes to ellipse. For the "presumed Normal" situation, which it seems you might be assuming, the relevant code is:

library(car)
dataEllipse
function(x,y, ....
...
else {
        shape <- var(cbind(x, y))
        center <- c(mean(x), mean(y))
    }
    for (level in levels) {
        radius <- sqrt(dfn * qf(level, dfn, dfd)

Then 'ellipse' calculates its individual points which get passed to lines. The code to do that final calculation is

ellipse <-
function (center, shape, radius, ....)
....
 angles <- (0:segments) * 2 * pi/segments
    unit.circle <- cbind(cos(angles), sin(angles))
    ellipse <- t(center + radius * t(unit.circle %*% chol(shape)))
    colnames(ellipse) <- c("x", "y")

So the combination of these two functions works with your data:

getEparams <-function(x,y, level) { dfn <- 2
        dfd <- length(x) - 1
        shape <- var(cbind(x, y))
        center <- c(mean(x), mean(y))
        radius <- sqrt(dfn * qf(level, dfn, dfd))
        return(list(center=center, shape=shape, radius=radius) ) }

ellcalc <- function (center, shape, radius, segments=51){segments=segments
    angles <- (0:segments) * 2 * pi/segments
    unit.circle <- cbind(cos(angles), sin(angles))
    ellipse <- t(center + radius * t(unit.circle %*% chol(shape)))
    colnames(ellipse) <- c("x", "y")
    return(ellipse)}

evals <- getEparams(Query$X, Query$Y, 0.80)
plot(ellcalc(evals[["center"]], evals[["shape"]], evals[["radius"]]))
title(main='Output of plot(ellcalc(evals[["center"]], evals[["shape"]], 
                           evals[["radius"]]))\nStackOverflow Demonstration')
 points(Query$X, Query$Y, cex=0.3, col="red")

You could obviously save or pass the results of the ellcalc call to any object you wanted

enter image description here

...that is really useful. Thank you. Can you tell me how the levels=80% chooses the points? ie., does it minimizes the area of the ellipse, or chooses points that are close to each other? I don't understand what is going on mathematically. Also how does segments=51 affect the resulting ellipse points? I am assuming the closer and more points there are, the more accurately one can bound a set of points. — user645600, Aug 14 '11 at 17:52
It is calculating the theoretic region that would be enclosed if the data had a mv-normal distirbution with the means and variance covariance matrix that the data exhibit. You can look at the code for dataEllipse if you want to see the "robust" method. You are right about segments. It just determines how many points are calculated. If you had used type="l" in the plot call you would get an almost smooth ellipse. — IRTFM, Aug 14 '11 at 18:44

plot Ellipse bounding a percentage of points

1 Answers1

Linked