Reduce inner points in numpy coordinate dataset (speed up concave hull)

Question

I have a 2D numpy array representing a point cloud of x,y coordinates:

[[ 247.68512  182.67136]
 [ 248.71936  182.67136]
 [ 249.74336  182.67136]
 ..., 
 [ 253.85984  269.1072 ]
 [ 254.89408  269.1072 ]
 [ 255.91808  269.1072 ]]

The shape is: (4974, 2)

I ultimately apply a concave hull algorithm to determine get the edge of the shape as x,y coordinates (similar to input).

Example of concave hull (js) http://dailyjs.com/post/hulljs

My issue is that the concave hull calculation of this large dataset can take upwards of 10s. Since we are only concerned with edge detection I think I can get rid of the inner points in the dataset and speed up the concave hull triangulation (by reducing number of operations).

Some basic ideas to start with:

Removing every nth data point (reduces resolution of edge heavily after n=2 so no good)
Calculate angle of each point from centre of shape, determine quadrant it lies in and scale x,y data accordingly (effectively shrinking the original 'polygon'), subtract original dataset by shrunk-dataset, use that to find concave hull (effectively remove all inner points in set)
Not opposed to using image edge detection (openCV, skimage) if the proper x,y locations of the points are retained (seems hard to do in practice)

Are there any obvious solutions to this problem?

Why do you need it to go faster? What's wrong with it taking 10 seconds? Is this critical performance or premature optimization? — Soviut, May 12 '17 at 03:27
Will have many shapes/datasets that need to be processed, would say it's critical performance. — 0000101010, May 12 '17 at 03:29
Your y values (assuming x, y) seem to be replicated suggesting that they are the same by row. Ideally you need the min and max per y (row) which would give you more than enough points for a convex hull calculation. — NaN, May 12 '17 at 03:41
Considering concave hull algorithms are already trying to remove inner points as efficiently as possible, it's unlikely anything we come up with for preprocesing is going to improve performance. All of your "basic ideas" (except possibly undersampling) would only work reliably on *convex* hulls — Daniel F, May 12 '17 at 06:09
I think NaN's idea could work quite well: Round np to int (will only affect accuracy slightly) > Sort dataset by x column > for x column remove y where not y-min/y-max. This probably isn't a valid method for any other concave hull problem but since my y-values do repeat often this could work. — 0000101010, May 12 '17 at 14:10
I also just realized that JavaScript concave hull is quite different than the Delaunay triangulation method I'm using. The js method looks way faster, ill try to implement that in python as well. — 0000101010, May 12 '17 at 14:16

score 0 · Answer 1 · edited May 23 '17 at 12:26

The simplest way to deal with this is to run the operation in parallel. Either across multiple cores or multiple machines. It's simplest because it means rather than trying to make your operation run more efficiently, you can take a large queue of operations and simply do them all at once, at a time cost of a single operation.

Numpy itself has parallel programming capabilities that you might be able to take advantage of. I'm not actually sure if a concave hull algorithm could be parallelized, but it feels like it could be split into quadrants to allow multiple cores to tackle different sections at once.

There are already plenty of answers on making more efficient concave hull routines on stack overflow.

Valid answer but I'm not looking to directly optimize concave hull - I'm already using one of the most optimized methods. Just looking to reduce the dataset I'm feeding it. Paralleling is an interesting idea — 0000101010, May 12 '17 at 04:05

Reduce inner points in numpy coordinate dataset (speed up concave hull)

1 Answers1