6

I have path set A and path set B. I am trying to find an algorithm to compare both path sets for similarity.

Path characteristics:

  1. Path sets are one or more lines with two or more points per line. Lines do not have to be connected.
  2. Path sets may overlap themselves (i.e. an X).
  3. Path sets may contain different number of vertices (i.e. one path may look similar to the other but have a lot more points in it).
  4. Points are not guaranteed to be in order for both path sets.

Scale should be taken into account, i.e. a small X should match a large X. Translation does not need to be taken into account for any paths because the bottom most point of any path will have y of 0 and left most point of any path will have x of 0.

Is there a best practice or well known algorithm (I have found little in my Google searches) to compare these kinds of path sets for similarity?

jjxtra
  • 20,415
  • 16
  • 100
  • 140
  • 1
    When are two path similar? You already stated that the should not be rotated, but might be scaled. What about transposition? Number of vertices? What if the path -- after rotation -- is the same as before, but the points have a different order (i.e. the X rotated 90 or 180 degrees)? – tobias_k Mar 19 '16 at 20:12
  • @tobias_k updated question – jjxtra Mar 20 '16 at 02:41
  • Two things you should really make clearer: (1) A "path" does not even have to be connected (this goes against most people's intuition about the word "path" -- I think you should actually choose a different word, e.g. "path set", since already one answerer is confused); (2) it's not enough just to look for pairs of points having similar co-ordinates, because scaling (and presumably, and even more fundamentally, *translation*) needs to be handled. – j_random_hacker Mar 20 '16 at 11:27
  • @j_random_hacker updated question – jjxtra Mar 20 '16 at 20:32

2 Answers2

4

Algorithmically, I think I would try something like this:

  1. For each path, convert the consecutive pairs of points comprising the path into a list of vectors, where a vector is defined as a pairing of a magnitude (length) and a direction (an angle relative to the X-axis). You can compute these values like this (C#):

    double dx = endPoint.X - startPoint.X;
    double dy = endPoint.Y - startPoint.Y;
    double magnitude = Math.Sqrt((dx * dx) + (dy * dy));
    double direction = Math.Atan2(dy, dx) * (180 / Math.PI);
    
  2. Next, "normalize" each vector sequence by combining consecutive vectors that have the same* direction. In other words, replace those with a new vector that has the same direction and the sum of their magnitudes. This will take care of the cases where you have more than two points on the same line anywhere on your paths. After this step you should have the same number of vectors in each sequence. (If not, the paths are not similar.)

  3. Figure out the scaling factor. Take the magnitude of the first vector in the first sequence and divide it by the magnitude of the first vector in the second sequence.

  4. Now you can compare the sequences for similarity by iterating over both sequences in tandem. For each corresponding vector in each sequence, check that their directions are equal* and the ratio of their magnitudes are equal* to the scaling factor. If not, the paths are not similar.

*When checking whether two double values are "equal", you must keep in mind that not every real number can be accurately represented by a double, so you cannot directly compare two doubles and expect accurate results. Instead you should decide on an error tolerance appropriate for your situation and determine whether the difference between the values you are comparing is within that tolerance. See What is the most effective way for float and double comparison? for extensive treatment of the subject.

Community
  • 1
  • 1
Brian Rogers
  • 125,747
  • 31
  • 299
  • 300
  • Wow that's a very detailed answer. Thank you. What would you suggest for cases where the paths look the same but have points done in the opposite direction? Think of an x drawn from bottom left to top right and then top let to bottom right vs. an x drawn the opposite way. – jjxtra Mar 20 '16 at 02:44
  • Well, for one thing, to draw an "X" in the way you describe would take two paths, not one, correct? When you lift the pen, doesn't that start a new path? Second, if you can have the points in opposite order, isn't that the same as a reflection or (a 180° rotation) of the original path(s)? Originally you had said you didn't need to handle rotations. So, I guess I need to know more about what you're really trying to do (is this handwriting recognition?), and what you consider to be "similar" versus not. Pictures would be good. I don't think my answer is going to work for you in its present form. – Brian Rogers Mar 20 '16 at 04:46
  • points may be in any order for any path, so it's not guaranteed that both path sets have points in exactly the same order. When I said rotation didn't matter, that would be something like a sideways arrow not having to match an up and down arrow. – jjxtra Mar 21 '16 at 00:04
0

Disclaimer: I am a layman in image processing. All the content in this answer is based on my conjecture, and is not tested and supported by literature.

I think we can make use of the concept of vertices of object. The objects concerned here are 1D lines, so vertices should be the end points of lines.

For example, for the image "X", assuming there are two lines, there should be four vertices, two per line.

Now for the image "X", it can in fact arrive from four lines, each joining at the center. Then the naive counting of vertices will give eight vertices, which is not quite what we want. One way to reduce this counting result to four, is to merge lines with neighborhood traversing. Imagine we are forming edges between points if they are within a vertical, horizontal and diagonal hop. Then we start with a random vertex and run DFS on the graph, which will give a set of dead-ends as vertices. This will give four vertices instead of eight.

For two images to be the same in your question, at least they need to have the same number of vertices. The distances between the vertices should be small when they are optimally aligned, so we can possibly pair the vertices greedily to find the optimal alignment. Find the closest pair between images, then the next closest etc until all vertices are paired. Then the similarity between images can be something like root mean square of euclidean distance of pairs.

Or, if the number of vertices is small enough, just optimize over O(N^3) (I think it's sum of decreasing squares...) possible pairs. That should give a better result.

I won't try this, because I am lazy...My imagination flies like a pig. Cheers!

Patrick the Cat
  • 2,138
  • 1
  • 16
  • 33