1

I have two lists of coordinates of different lengths representing points on a graph and I need them to be the same length:

x = [[10, 20], [15, 22]...] #len == 1200
y = [[12, 12], [16, 21]...] #len == 1300

Whilst researching I came across the Ramer–Douglas–Peucker algorithm (https://pypi.org/project/rdp/) and this seemed promising in that it smoothed the curves and reduced the number of points, the issue is I need the algorithm to return a fixed number of points. Is this at all possible?

Edit: Data comes from opencv contour operation on two separate images, plotted below as the green and red line, note I dont just need this to work for this image but for multiple images I have enter image description here

James Hall
  • 53
  • 6
  • Could you solve this by removing the points which are closest together until you reach the desired length? – Cresht Jan 16 '22 at 21:05
  • This could work but majority of points are equally close to each other, that is why I was leaning towards a smoothing algorithm as it would remove the "outliers" so to speak. Does that make sense? – James Hall Jan 16 '22 at 21:11
  • 1
    Can you give some more detail about the data you want to graph here? Is it smooth? Do you know a/the "model" behind it? (For example, could you just use splines/some model fit and then sample that according to the shorter one?) Do they only need to be of the same length or evaluated at the same points? (And maybe: Why do they need to be of the same length, just to make sure that this is not an [xy problem](https://en.wikipedia.org/wiki/XY_problem).) – Michael Kopp Jan 16 '22 at 21:22
  • Data actually comes from operations on an image using opencv, pillow and skimage. This particular dataset comes from opencv contours. This means it is not necessarily smooth nor is it linear – James Hall Jan 16 '22 at 21:30
  • Would it be sufficient for your case to find a linear stretch in the longer outline and remove most points there (or subsample)? To find a "straight" stretch you could so something like `(d.diff()["y"] / d.diff()["x"]).diff().abs() < 0.01` (which finds places where the "slope" between adjacent points does not differ much) and then https://stackoverflow.com/a/4495197/2165903 to get the stretch, then fit a line and then sub-sample in that line. By using subsampling you can easily control the number of points to use. – Michael Kopp Jan 16 '22 at 22:29

1 Answers1

0

The algorithm that you mentions basically checks for colinearity of points and then removes the points inbetween.

You could modify the idea of that algorithm to obtain your goal;

First, identify all the stretches of colinear points. Sort them from large to small. (The characteristic to sort by could be the distance between the ends or the number of points in between; don't know which one would work best.) Then, start replacing the longest section by its end points until you reach the desired number of points.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94