4

I'm trying to draw ellipses around points of a group on a graph, with matplotlib. I would like to obtain something like this:

enter image description here

A dataset for a group (the red one for example) could look like this:

[[-23.88315146  -3.26328266]  # first point
 [-25.94906669  -1.47440904]  # second point
 [-26.52423229  -4.84947907]]  # third point

I can easily draw the points on a graph, but I encounter problems to draw the ellipses.

The ellipses have diameters of 2 * standard deviation, and its center has the coordinates (x_mean, y_mean). The width of one ellipse equals the x standard deviation * 2. Its height equals the y standard deviation * 2.

However, I don't know how to calculate the angle of the ellipses (you can see on the picture the ellipses are not perfectly vertical).

Do you have an idea about how to do that ?

Note: This question is a simplification of LDA problem (Linear Discriminant Analysis). I'm trying to simplify the problem to its most basic expression.

Mathieu David
  • 4,706
  • 3
  • 18
  • 28
JPFrancoia
  • 4,866
  • 10
  • 43
  • 73

3 Answers3

2

This is a well-studied problem. First take the convex hull of the set of points you wish to enclose. Then perform computations as described in the literature. I provide two sources below.

"Smallest Enclosing Ellipses--An Exact and Generic Implementation in C++" (abstract link).


          Ellipse

Charles F. Van Loan. "Using the Ellipse to Fit and Enclose Data Points." (PDF download).

Joseph O'Rourke
  • 4,346
  • 16
  • 25
1

I wrote a simple function to implement Mathieu David's solution. I'm sure there are many ways to do this, but this worked for my application.

    def get_ellipse_params(self, points):
        ''' Calculate the parameters needed to graph an ellipse around a cluster of points in 2D.

            Calculate the height, width and angle of an ellipse to enclose the points in a cluster.
            Calculate the width by finding the maximum distance between the x-coordinates of points
            in the cluster, and the height by finding the maximum distance between the y-coordinates
            in the cluster. Multiple both by a scale factor to give padding around the points when
            constructing the ellipse. Calculate the angle by taking the inverse tangent of the
            gradient of the regression line. Note that tangent solutions repeat every 180 degrees,
            and so to ensure the correct solution has been found for plotting, add a correction
            factor of +/- 90 degrees if the magnitude of the angle exceeds 45 degrees.

            Args:
                points (ndarray): The points in a cluster to enclose with an ellipse, containing n
                                  ndarray elements representing each point, each with d elements
                                  representing the coordinates for the point.

            Returns:
                width (float):  The width of the ellipse.
                height (float): The height of the ellipse.
                angle (float):  The angle of the ellipse in degrees.
        '''
        if points.ndim == 1:
            width, height, angle = 0.1, 0.1, 0
            return width, height, angle

        else:
            SCALE = 2.5
            width = np.amax(points[:,0]) - np.amin(points[:,0])
            height = np.amax(points[:,1]) - np.amin(points[:,1])
            
            # Calculate angle
            x_reg, y_reg = [[p[0]] for p in points], [[p[1]] for p in points]
            grad = LinearRegression().fit(x_reg, y_reg).coef_[0][0]
            angle = np.degrees(np.arctan(grad))

            # Account for multiple solutions of arctan
            if angle < -45: angle += 90
            elif angle > 45: angle -= 90

            return width*SCALE, height*SCALE, angle
0

This has a lot more to do with mathematics than programming ;)

Since you already have the dimensions and only want to find the angle, here is what I would do (based on my instinct):

Try to find the line that best fits the given set of points (trendline), this is also called Linear Regression. There are several methods to do this but the Least Squares method is a relatively easy one (see below).

Once you found the best fitting line, you could use the slope as your angle.

Least Squares Linear Regression

The least squares linear regression method is used to find the slope of the trendline, exactly what we want.

Here is a video explaining how it works

Let's assume you have a data set: data = [(x1, y1), (x2, y2), ...]

Using the least square method, your slope would be:

# I see in your example that you already have x_mean and y_mean
# No need to calculate them again, skip the two following lines
# and use your values in the rest of the example
avg_x = sum(element[0] for element in data)/len(data)
avg_y = sum(element[1] for element in data)/len(data)

x_diff = [element[0] - avg_x for element in data]
y_diff = [element[1] - avg_y for element in data]

x_diff_squared = [element**2 for element in x_diff]

slope = sum(x * y for x,y in zip(x_diff, y_diff)) / sum(x_diff_squared)

Once you have that, you are almost done. The slope is equal to the tangent of the angle slope = tan(angle)

Use python's math module angle = math.atan(slope) this will return the angle in radians. If you want it in degrees you have to convert it using math.degrees(angle)

Combine this with the dimensions and position you already have and you got yourself an ellipse ;)


This is how I would solve this particular problem, but there are probably a thousand different methods that would have worked too and may eventually be better (and more complex) than what I propose.

Mathieu David
  • 4,706
  • 3
  • 18
  • 28