1

So I have multiple paths stored, each path would consist of data points x1,y1 | x2, y2 | x3, y3 ... etc

I would like to compare these paths with one another to work out if any similarities are present.

I could run through each point and see if it matched any of the points in the first path, then look to see if the next point matches the next point.

I think this would work if there were no anomalies, but could skip over if the next point did not match.

I would like to build in some level of tolerance eg 10, 10 may match 12, 12 or 8, 8

Is this a good way to compare the data, or is there a better approach?

As a second step I may want to consider time as a value too, so each point would have a time value associated with it.

Spektre
  • 49,595
  • 11
  • 110
  • 380
Charles Bryant
  • 995
  • 2
  • 18
  • 30
  • You might want to look into dynamic time-warping, a method that aligns two series that might go with different speed and finds the best match. – Andreas Mueller Nov 06 '14 at 15:48
  • What about some input paths examples so we actually see what are you comparing with what... include similar and different examples, add the conditions for comparison like is the comparison invariant on scale,rotation,translation,... add any other criteria you need to match – Spektre Mar 17 '15 at 12:09
  • I re-tag your question so check if it is OK or repair if not. this have nothing to do with machine learning it just confuses others (does not matter if it is used for it) always be careful with tags. wrongly selected tags leads to no or wrong answers because most people sort questions by tags ... Also the Title could be improved to better match your question ... – Spektre Mar 17 '15 at 12:17

2 Answers2

1

Some possible approaches you can use:

  1. handle booth paths as polygon and compare them as such

    see: How to compare two shapes?

  2. use OCR algorithms/approaches

    see: OCR and character similarity

  3. transform both paths to synchronized dataset and correlate

    either extract significant points only and/or resample paths to the same point count. Then synchronize booth datasets (as in bullet 1) and use correlation coefficient

[notes]

Depending on the input data you can also exploit DCT/DFT transforms to remove unimportant data (like in JPG compression) And or compare in frequency domain instead of spatial/time domain.

You can also compare obvious things (invariant on rotation and translation) like

  1. area
  2. perimeter length
  3. number of self-intersections
  4. number of inflex points
Spektre
  • 49,595
  • 11
  • 110
  • 380
  • Thanks for the reply some interesting options, I have actually started working on this. I have a working example at the moment, I have a few checks, does point b1, reside within radius of point a1 or a2, does point b1 reside between the two vectors of a1+ r, a1 - r and a1 + r, a2 + r, these points would be two sides of a rectangle, or does vector b1, b2 intersect the radius of a2. Whew, this seems to work I am sure there are some paths that will return false positives, but it has given me my first working model. – Charles Bryant Mar 17 '15 at 12:56
0

u could compare the mean and variances of the two set of points. If they are on straight lines, as you hypothesize, you could fit straight lines through the two datasets and then compare the parameters of the two straight lines to infer about their distances. It would be more helpful if you could tell the behavour of the two datasets.

  • The paths can go any where on a grid, so could essentially cross over or go anywhere, but I could translate the points so instead of point two being plotted left of point one I could translate that the same distance to the right. – Charles Bryant Nov 05 '14 at 17:51
  • why do u want to do this – Mujtaba Hasan Nov 06 '14 at 05:37