0

I am trying to calculate the area of a shape enclosed by a large set of unordered points in python. I have a 2D array of points which I can plot as a scatterplot like this. enter image description here

There are several ways to calculate the area enclosed by points, but these all assume ordered points, such as here and here. This method calculates the area unordered points, but it doesn't appear to work for complex shapes, as seen here. How would I calculate this area from unordered points in python?

Sample data looks like this:

[[225.93459  -27.25677 ]
 [226.98128  -32.001945]
 [223.3623   -34.119724]
 [225.84741  -34.416553]]

From pen and paper one can see that this shape contains an area of ~12 (unitless) but putting these coordinates into one of the algorithms linked to previously returns an area of ~0.78.

Christian Seitz
  • 728
  • 6
  • 15
  • 2
    Just to point out a flaw in your assumptions, those algorithms compute the area of a polygon given the coordinates of its EDGES. A scatter plot doesn't have edges, and it does not "enclose" a region. You see a region because each of your scatter plot points has a diameter, but a "point" mathematically is infinitely small. The points don't touch. If you can compute a "bounding polygon" from your points, then you would stand a chance, but your interior regions all depend on the size of your scatter plot dots. – Tim Roberts Aug 25 '22 at 19:54
  • 3
    Honestly, you might consider taking THAT graph as a bitmap, and counting the number of non-white pixels in it. That is probably as close as you can get. – Tim Roberts Aug 25 '22 at 19:58
  • Please provide the full list of the data points and the code for generating the scatter plot out of them, so that I can refer to it working on a solution. – Claudio Aug 25 '22 at 22:22
  • Sure. The real data is too big but this will generate a similar plot: ```plt.figure() density_scatter(np.random.randn(10000), np.random.randn(10000)) plt.xlim([-5, 5]) plt.ylim([-5, 5])``` where the code for ```density_scatter``` comes from here: https://stackoverflow.com/questions/20105364/how-can-i-make-a-scatter-plot-colored-by-density-in-matplotlib/53865762#53865762 – Christian Seitz Aug 25 '22 at 23:29
  • Perhaps [alpha shapes](https://en.wikipedia.org/wiki/Alpha_shape) are suitable for outer points ordering in your case – MBo Aug 26 '22 at 05:27
  • The solution to the area calculation is at the fingertips if you realize that the scatterplot creating function returns an array of values used for the pixels in the image you see in the plot. See my second answer showing the actual ease with which the area can be calculated. – Claudio Aug 27 '22 at 06:57
  • @TimRoberts : see my second answer for a proof that it is not necessary to deal with the bitmap as such, as the pixel values are calculated from the numpy array returned by the scatterplot creating function. – Claudio Aug 27 '22 at 07:01

2 Answers2

2

Let's first mention that in the question How would I calculate this area from unordered points in python? used phrase 'unordered points' in the context of calculation of an area usually means that given are points of a contour enclosing an area which area is to calculate.

But in the question provided data sample are not points of a contour but just a cloud of points, which if visualized using a scatterplot results in a visually perceivable area.

The above is the reason why in the question provided links to algorithms calculating areas from 'unordered points' don't apply at all to what the question is about.

In other words, the actual title of the question I will answer below will be:

Calculate the visually perceivable area a cloud of (x,y) points is forming when visualized as a scatterplot

One of the possible options is mentioned in a comment to the question:

Honestly, you might consider taking THAT graph as a bitmap, and counting the number of non-white pixels in it. That is probably as close as you can get. – Tim Roberts

Given the image perfectly covering (without any margin) all the non-white pixels you can calculate the area the image rectangle is covering in units used in the underlying (x,y) data by calculating the area TA of the rectangle visible in the image from the underlying list of points P with (x,y) point coordinates ( P = [(x1,y1), (x2,y2), ...] ) as follows:

X = [x for x,y in P]
Y = [y for x,y in P]
TA = (max(X)-min(X))*(max(Y)-min(Y)) 

Assuming N_white is the number of all white pixels in the image with N pixels the actual area A covered by non-white pixels expressed in units used in the list of points P will be:

A = TA*(N-N_white)/N

Another approach using a list of points P with (x,y) point coordinates only ( without creation of an image ) consists of following steps:

  • decide which area Ap a point is covering and calculate half of the size h2 of a rectangle with this area around that point ( h2 = 0.5*sqrt(Ap) )
  • create a list R with rectangles around all points in the list P: R = [(x-h2, y+h2, x+h2, y-h2) for x,y in P]
  • use the code provided through a link listed in the stackoverflow question Area of Union Of Rectangles using Segment Trees to calculate the total area covered by the rectangles in the list R.

The above approach has the advantage over the graphical one obtained from the scatterplot that with the choice of the area covered by a point you directly influence the used precision/resolution/granularity for the area calculation.

Claudio
  • 7,474
  • 3
  • 18
  • 48
1

Given a 2D array of points the area covered by the points can be calculated with help of the return value of the same hist2d() function provided in the matplotlib module (as matplotlib.pyplot.hist2d()) which is used to show the scatterplot.

The 'trick' is to set the cmin parameter value of the function to 1 ( cmin=1 ) and then calculate the number of numpy.nan values in the by the function returned array setting them in relation to entire amount of array values.

In other words all what is necessary to calculate the area when creating the scatterplot is already there for easy use in a simple area calculation formulas if you know that the histogram creating function provide as return value all what is therefore necessary.

Below code of a ready to use function for the area calculation along with demonstration of function usage:

def area_of_points(points, grid_size = [1000, 1000]):
   """ 
   Returns the area covered by N 2D-points provided in a 'points' array
       points = [ (x1,y1), (x2,y2), ... , (xN, yN) ]
   'grid_size' gives the number of grid cells in x and y direction the
   'points' bounding box is divided into for calculation of the area. 
   Larger 'grid_size' values mean smaller grid cells, higher precision 
   of the area calculation and longer runtime.  
         area_of_points() requires installed matplotlib module. """
   import matplotlib.pyplot as plt
   import numpy as np
   pts_x = [x for x,y in points]
   pts_y = [y for x,y in points]
   pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
   h2D,_,_,_ = plt.hist2d( pts_x, pts_y, bins = grid_size, cmin=1)
   numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
   numberOfAll2Dbins = h2D.shape[0]*h2D.shape[1] 
   areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
   pts_pts_area = areaFactor * pts_bb_area
   print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {pts_pts_area:8.4f}')
   plt.show()
   return pts_pts_area
#:def area_of_points(points, grid_size = [1000, 1000])

import numpy as np
np.random.seed(12345)
x =     np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
print(area_of_points(pts))
# ^-- prints: Areas: b-box = 114.5797, points =   7.8001
# ^-- prints: 7.800126875291629

The above code creates following scatterplot:

scatterplot

Notice that the printed output Areas: b-box = 114.5797, points = 7.8001 and the by the function returned area value 7.800126875291629 give the area in units in which the x,y coordinates in the array of points are specified.

Instead of usage of a function when utilizing the know how you can play around with the parameter of the scatterplot calculating the area of what can be seen in the scatterplot.

Below code which changes the displayed scatterplot using the same underlying point data:

import numpy as np
np.random.seed(12345)
x =     np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
pts_values_example = \
[[0.53005, 2.79209],
 [0.73751, 0.18978],
        ...        ,
 [-0.6633, -2.0404],
 [1.51470, 0.86644]]
# ---
pts_x = [x for x,y in pts]
pts_y = [y for x,y in pts]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
# ---
import matplotlib.pyplot as plt
bins = [320, 300] # resolution of the grid (for the scatter plot)
# ^-- resolution of precision for the calculation of area
pltRetVal = plt.hist2d( pts_x, pts_y, bins = bins, cmin=1, cmax=15 )
plt.colorbar()  # display the colorbar (for a 2d density histogram)
plt.show()
# ---
h2D, xedges1D, yedges1D, h2DhistogramObject = pltRetVal
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = (len(xedges1D)-1)*(len(yedges1D)-1)
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {area:8.4f}')
# prints "Areas: b-box = 114.5797, points =  20.7174"

creating following scatterplot:

scatterplot

Notice that the calculated area is now larger due to smaller values used for grid resolution resulting in more of the area colored.

Claudio
  • 7,474
  • 3
  • 18
  • 48