13

I use matplotlib 1.15.1 and I try to generate scattergrams like this:

example

The ellipses have fixes size and are drawn with center coordinates, width, height and angle (provided from outside): I have no idea what their equotions are.

g_ell_center = (0.8882, 0.8882)
g_ell_width = 0.36401857095483
g_ell_height = 0.16928136341606
g_ellipse = patches.Ellipse(g_ell_center, g_ell_width, g_ell_height, angle=angle, fill=False, edgecolor='green', linewidth=2)

This ellipses should mark normal and semi-normal data on my plot. Then, I have an array of ~500 points which must be colored according to ellipse they belong to. So I tried to check each point with contains_point method:

colors_array = []
colors_scheme = ['green', 'yellow', 'black']
for point in points_array:
    if g_ellipse.contains_point(point, radius=0):
        colors_array.append(0)
    elif y_ellipse.contains_point(point, radius=0):
        colors_array.append(1)
    else:
        colors_array.append(2)

Finally, points are drawn:

plt.scatter(x_array, y_array, s=10, c=[colors_scheme[x] for x in colors_array], edgecolor="k", linewidths=0.3)

But contains_point is extremely slow! It worked for 5 minutes for 300-points scattergram, and I have to generate thousands of them in parallel. Maybe there's faster approach? P.S. Whole project is bound to matplotlib, I can't use other libraries.

tmdavison
  • 64,360
  • 12
  • 187
  • 165
Maroth
  • 195
  • 2
  • 7
  • determine two focus of ellipse, calculate sum of distance from the point to two focus. if that's less than major axis, the point is within the ellipse. https://en.wikipedia.org/wiki/Ellipse – yosukesabai May 04 '16 at 16:09

2 Answers2

17

This approach should test if a point is within an ellipse, given the ellipse's centre, width, height and angle. You find the point's x and y coordinates relative to the ellipse centre, then transform those using the angle to be the coordinates along the major and minor axes. Finally, you find the normalised distance of the point from the cell centre, where a distance of 1 would be on the ellipse, less than 1 is inside, and more than 1 is outside.

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

fig,ax = plt.subplots(1)
ax.set_aspect('equal')

# Some test points
x = np.random.rand(500)*0.5+0.7
y = np.random.rand(500)*0.5+0.7

# The ellipse
g_ell_center = (0.8882, 0.8882)
g_ell_width = 0.36401857095483
g_ell_height = 0.16928136341606
angle = 30.

g_ellipse = patches.Ellipse(g_ell_center, g_ell_width, g_ell_height, angle=angle, fill=False, edgecolor='green', linewidth=2)
ax.add_patch(g_ellipse)

cos_angle = np.cos(np.radians(180.-angle))
sin_angle = np.sin(np.radians(180.-angle))

xc = x - g_ell_center[0]
yc = y - g_ell_center[1]

xct = xc * cos_angle - yc * sin_angle
yct = xc * sin_angle + yc * cos_angle 

rad_cc = (xct**2/(g_ell_width/2.)**2) + (yct**2/(g_ell_height/2.)**2)

# Set the colors. Black if outside the ellipse, green if inside
colors_array = np.array(['black'] * len(rad_cc))
colors_array[np.where(rad_cc <= 1.)[0]] = 'green'

ax.scatter(x,y,c=colors_array,linewidths=0.3)

plt.show()

enter image description here

Note, this whole script takes 0.6 seconds to run and process 500 points. That includes creating and saving the figure, etc.

The process of setting the colors_array using the np.where method above takes 0.00007s for 500 points.

Note, in an older implementation shown below, setting the colors_array in a loop took 0.00016 s:

colors_array = []

for r in rad_cc:
    if r <= 1.:
        # point in ellipse
        colors_array.append('green')
    else:
        # point not in ellipse
        colors_array.append('black')
tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • 1
    It would be interesting if you said something about the performance of this solution. – unwind May 04 '16 at 16:18
  • If you factored out `np.radians(180.-angle)` and only calculated `sin` and `cos` once for a given angle, your code would become more compact and probably noticeably faster. – 9000 May 04 '16 at 16:18
  • thanks, I will try. It only takes seconds to test these 500 points though – tmdavison May 04 '16 at 16:19
  • 1
    @unwind: It's linear wrt the number of points, does not depend on the ellipse's size and orientation, and uses constant memory for the calculation (or again linear if you accumulate the results). Trigonometric functions are pretty fast on modern CPUs. – 9000 May 04 '16 at 16:22
  • @tom: Actually, you can calculate the `sin` and `cos` outside the loop, drastically improving performance. (And waiting whole seconds for a visualization is not best; I suspect this solution can get several frames per second after some polishing.) – 9000 May 04 '16 at 16:24
  • Maybe seconds was an over-estimation. I meant that as a comparison to the OP's >5 minutes. Anyway the for-loop went from taking 0.0060 seconds to 0.0012 seconds in my first edit (for 500 points). – tmdavison May 04 '16 at 16:27
  • 1
    @9000, OK, I see, I've vectorised a lot more of the calculation now, and it has sped up to 0.00017 seconds now. Thanks. I was converting some old fortran code that only operated on single particles, so there was no need to vectorise and I didn't think it though properly before posting! – tmdavison May 04 '16 at 16:33
  • This works perfectly! Thank you very much. Now I feel really ashamed of my Math skills. – Maroth May 05 '16 at 06:48
1

Your current implementation should only be calling contains_point 25,000 to 50,000 times, which isn't a lot. So, I'm guessing that the implementation of contains_point is targeted toward precision rather than speed.

Since you have a distribution of points where only a small percentage will be in any given ellipse, and therefore most will rarely be anywhere near any given ellipse, you can easily use rectangular coordinates as a short-cut to figure out whether the point is close enough to the ellipse to be worth calling contains_point.

Compute the left and right x coordinates and top and bottom y coordinates of the ellipse, possibly with a bit of padding to account for rendering differences, then check if the point is within those, such as the following pseudo-code:

if point.x >= ellipse_left and point.x <= ellipse_right and _
   point.y >= ellipse_top and point.y <= ellipse_bottom:
    if ellipse.contains_point(point, radius=0):
        ... use the contained point here

This approach eliminates expensive calculations for most of the points, allowing simple comparisons instead to rule out the obvious mismatches, while preserving the accuracy of the computations where the point is close enough that it might be in the ellipse. If e.g. only 1% of your points are anywhere near a given ellipse, this approach will eliminate 99% of your calls to contains_point and instead replace them with much faster comparisons.

Matt Jordan
  • 2,133
  • 9
  • 10
  • Good solution. But unfortunately, in my case 60-100 % of points are inside both ellipses or close to their borders. – Maroth May 05 '16 at 06:44