2

Problem description:

I have a large array of 2D points (>150k) that represent an edge of a figure. The point coordinates are all integer (minimum distance between points = 1). Plotting the edge points looks like this:

Scatter plot of edge points

In some places, it looks like the lines are touching, but they are not. I want to import these points into a CAD program and draw the edge line. For this, however, I must ensure that each point has exactly 2 neighbors with a distance of 1. If a point has more than 2 neighbors, then points need to be removed until there are only 2. It does not matter which points are removed exactly because small geometric deviations are not relevant.

Is there a way to do this in Python efficiently? Unfortunately, I lack experience.

Approach:

Here is my first (very inefficient) approach:

edges_points is the input array. It is a list of tuples and each tuple contains the x and y coordinates of a point:

In [20]: len(edges_points)
Out[20]: 156350

In [21]: edges_points[0]
Out[21]: (138, 140)

Here is the Python script (edges_points is already defined):

def remove_if_more_than_two_neighbors(edges_points,x,y):
    check = 0
    start_over = 0
    for i in [-1,0,1]: #check for all 8 possible neighbors
        for j in [-1,0,1]: 
            if p+i >= len(edges_points) or p+j>= len(edges_points) or p+i <0 or p+j <0: #prevent out-of-range indexing
                continue 
            if (x+i,y+j) in edges_points: #check if there is a neighboring point in the input list
                check = check+1
                if check > 3: #if there are more than 2 neighbors, remove the unwanted point and start over
                    edges_points.remove((x+i,y+j))
                    start_over = 1
                    return [edges_points, start_over]

    return [edges_points, start_over]

for p in range(len(edges_points)):
    stop = 0
    while(stop == 0):
        check = 0
        x = edges_points[p][0]
        y = edges_points[p][1]
        func_return = remove_if_more_than_two_neighbors(edges_points,x,y)
        edges_points=func_return[0]
        start_over=func_return[1]
        if start_over == 0:           
            stop = 1

The script seems to be removing points, but it is so slow that it hardly makes sense. I realize that there are some other issues regarding this task (like removing disconnected points or point ordering in general) but I will adress those later.

Thank you for your help!

Fel
  • 31
  • 3
  • 1
    I would use a [quadtree](https://en.wikipedia.org/wiki/Quadtree) to be able to efficiently query for the closest points to a point. Or perhaps [scikit-learn's KD-tree](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html) or [scipy's KD-tree](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html) if I'm too lazy to use a quadtree library. – Stef Jan 31 '23 at 16:09
  • See also [Is any of these quad-tree libraries any good?](https://stackoverflow.com/questions/2298517/are-any-of-these-quad-tree-libraries-any-good) – Stef Jan 31 '23 at 16:12

1 Answers1

0

If the coordinates range in [0, 10000] x [0, 10000] or less, it can make sense to use an image of that size (1 bit per pixel or 1 byte per pixel).

First clear the image and draw every point. Then during a second scan of the points, you can quickly test the immediate neighbors.


If this is too big to your taste, you can resort to binning: see the image as a tiling with tiles 16 x 16 (say), and only represent the tiles that contain points. You will need a hash map to make the connection between the coordinates of a tile and the image of that tile.

When you set a pixel, you first check if the tile exists and create it if necessary; then write to the tile at the correct offset. To later test a pixel, if the tile does not exist, the point is not there; otherwise, you read at the right offset.

  • Hi, thank you for your answer. It is funny that you mention the image approach because the data originally comes from a binary image. So i already have the point data in a matrix that contains Zeros and Ones. How can I use this to my advantage for this specific problem? I guess it might be easier to set the value of an element to zero rather than remove it from a list? Or are there any specific functions I can use? – Fel Jan 31 '23 at 16:53
  • @Fel: testing /clearing the 4 or 8 neighbors of a pixel in an image is trivial. –  Jan 31 '23 at 18:04
  • @Fel: this seems to be a perfect example of an XY question :-) –  Feb 01 '23 at 10:24
  • Hi, as I said I don't have much experience and I certainly don't know what you mean by XY question. Sure, I can check the 8 values of each pixel in the binary picture. If there are more than two with the value 1, then I can set the rest to zero. But how do I make sure that I don't create any loose ends (break the line)? – Fel Feb 01 '23 at 13:27
  • 1
    @Fel: what you are after is a *morphological thinning* algorithm. –  Feb 01 '23 at 13:58
  • This actually helped. I now use the "skeletonize" method from skimage and it does exactly what I want. Thank you. – Fel Feb 05 '23 at 10:24