3

I have a huge array with coordinates describing a 3D curves, ~20000 points. I am trying to use less points, by ignoring some, say take 1 every 2 points. When I do this and I plot the reduced number of points the shape looks the same. However I would like to compare the two curves properly, similar to the chi squared test to see how much the reduced plot differs from the original.

Is there an easy, built-in way of doing this or does anyone have any ideas on how to approach the problem.

xdze2
  • 3,986
  • 2
  • 12
  • 29
Agustin
  • 1,458
  • 1
  • 13
  • 30
  • What is the dimension of your data? is it a 3D shape, a 2D curve? – xdze2 Aug 07 '18 at 15:46
  • 2
    It's a 3D curve – Agustin Aug 07 '18 at 16:22
  • I can think of two possibilities: In the first, you would pass the reduced data set itself through both fitted models and compare statistics such as max/min error, RMSE, and R-squared for the two sets of results. In the second, you would create a rectangular grid and pass that through both models to make similar comparisons. The first method would compare fitting data, and the second method would compare any given data range. – James Phillips Aug 07 '18 at 17:59

2 Answers2

3

The general question of "line simplification" seems to be an entire field of research. I recommend you to have a look, for instance, at the Ramer–Douglas–Peucker algorithm. There are several python modules, I could find: rdp and simplification (which also implement the Py-Visvalingam-Whyatt algorithm).

Anyway, I am trying something for evaluating the difference between two polylines, using interpolation. Any curves can be compared, even without common points.

The first idea is to compute the distance along the path for both polylines. They are used as landmarks to go from one given point on the first curve to a relatively close point on the other curve.

Then, the points of the first curve can be interpolated on the other curve. These two datasets can now be compared, point by point.

On the graph, the black curve is the interpolation of xy2 on the curve xy1. So the distances between the black squares and the orange circles can be computed, and averaged.

This gives an average distance measure, but nothing to compare against and decide if the applied reduction is good enough...

example graph

def normed_distance_along_path( polyline ):
    polyline = np.asarray(polyline)
    distance = np.cumsum( np.sqrt(np.sum( np.diff(polyline, axis=1)**2, axis=0 )) )
    return np.insert(distance, 0, 0)/distance[-1]

def average_distance_between_polylines(xy1, xy2):   
    s1 = normed_distance_along_path(xy1)
    s2 = normed_distance_along_path(xy2)

    interpol_xy1 = interp1d( s1, xy1 )
    xy1_on_2 = interpol_xy1(s2)

    node_to_node_distance = np.sqrt(np.sum( (xy1_on_2 - xy2)**2, axis=0 ))

    return node_to_node_distance.mean() # or use the max

# Two example polyline:
xy1 = [0, 1, 8, 2, 1.7],  [1, 0, 6, 7, 1.9]   # it should work in 3D too
xy2 = [.1, .6, 4, 8.3, 2.1, 2.2, 2],  [.8, .1, 2, 6.4, 6.7, 4.4, 2.3]

average_distance_between_polylines(xy1, xy2)  # 0.45004578069119189
xdze2
  • 3,986
  • 2
  • 12
  • 29
  • also https://stackoverflow.com/questions/51486120/similarity-between-two-curves-in-python – xdze2 Aug 07 '18 at 22:33
  • Thank you for this, it sounds just like what I need. – Agustin Aug 08 '18 at 07:51
  • Are you aware of a method that finds a curve with fewer points but the points don't have to be part of the original data? Example: I have 20000 points that represent a circle, using rdp my final curve would be undershooting, all of it would be inside the circle, is there an algorithm that will shift some points outside of the original circle to minimize over and undershooting? Thanks again for your help – Agustin Aug 08 '18 at 09:06
  • then it is more a fit than an interpolation, I would look at something like ['spline smoothing'](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html). Maybe there is other dedicated methods depending on your field and data type, try a new question with a short data sample – xdze2 Aug 08 '18 at 09:20
  • If you know this is a circle, could you not directly fit a circle using an optimization method over the radius, position of the center and orientation (7 parameters)? see https://stackoverflow.com/a/42936444/8069403 - ok, it is for the example – xdze2 Aug 08 '18 at 09:25
  • Thank you, I will write another question, your answer works for what I want, I just want to estimate how much detail I'm losing further on. The circle is a simplified example. – Agustin Aug 08 '18 at 09:26
1

If you subsample the original curve, a simple way to assess the approximation error is by computing the maximum distance between the original curve and the line segments between the resampled vertices. The maximum distance occurs at the original vertices and it suffices to evaluate at these points only.

By the way, this provides a simple way to perform the subsampling by setting a maximum tolerance and decimating until the tolerance is exceeded.

You can also think of computing the average distance, but this probably involves nasty integrals and might give less visually pleasing results.