0

This is a language agnostic question, more algorithm design oriented.

Imagine we have an two arrays of points in 3D space (each looks like [(1, 0, 2), (2, 4, 32), ...])

The first array represents the first state of the points and the second represents a later state where the points have shifted a small amount (not necessarily each by the same distance). Note: A few points could have been removed and some new ones added in the second state.

Problem: Given these two arrays, how could one match (to a reasonable degree of accuracy) each shifted point to its original point, while also identifying which points are new and did not exist in the first state?


Ideas: I was thinking that some sort of k-means clustering could be applied, but I'm not sure how it would handle the fact that some points could have been removed/added between the states - so I don't think that approach would work well.


Edit:

Points are not necessarily added at any particular place in the array and the order is not necessarily maintained for persisting points between states. The distance the points are shifted between states should be relatively small compared to the distance between unique points - otherwise this problem is basically impossible.

abagshaw
  • 6,162
  • 4
  • 38
  • 76
  • When points are added, are they added at the end of the array or scattered within? For points that aren't added or removed, do they remain in the same relative order in the two arrays? How does the distance that points might move compare in scale to the distance between points? – Ted Hopp May 02 '18 at 04:59
  • @TedHopp See edits. – abagshaw May 02 '18 at 05:02
  • 1
    Then it seems like a nearest neighbor search might be the easiest approach. See [this thread](https://stackoverflow.com/questions/4350215/fastest-nearest-neighbor-algorithm) if you want to go that route. For other approaches, take a look at [point set registration](https://en.wikipedia.org/wiki/Point_set_registration) (a.k.a. _point matching_) algorithms. – Ted Hopp May 02 '18 at 05:08
  • @TedHopp, since movements of points are completely random, you may quite easily find yourself in a situation that the copy of point A suddenly appears nearer to point B that point B's copy, dismantling the **nearest neighbor** approach. – FDavidov May 02 '18 at 05:14
  • @FDavidov - OP indicated that the scale of movement is "relatively small" compared to the distance between points. Therefore the situation you describe shouldn't happen. – Ted Hopp May 02 '18 at 05:54
  • @TedHopp, if you wait long enough, it could certainly happen. – FDavidov May 02 '18 at 06:02
  • @FDavidov - That makes no sense in the context of what OP described. You're given two arrays of points that are snapshots of an evolving population of points. OP is stating that it's a given that these snapshots satisfy the condition that all point movement is small compared to the distance between points. There's no such thing as waiting, much less "waiting long enough". OP is asking for an algorithm to deal with that particular case. – Ted Hopp May 02 '18 at 06:05
  • @TedHopp, please re-read the question: _The first array represents the first state of the points_ and hence does not evolve. Only the second array evolves with the new points and positions. – FDavidov May 02 '18 at 06:09
  • 1
    @FDavidov - The second array doesn't evolve. It is the state after any evolution has taken place, and whatever changes there are between the two states satisfy the "small motion" condition. You seem to be imagining that the algorithm is going to be executed repeatedly as the first array stays fixed and the second array continues to vary, but that's something you injected into the problem. There's nothing like that in OP's question. – Ted Hopp May 02 '18 at 06:15
  • @TedHopp OK, let's assume that you start with _n_ points such that the distances between any pair of them is at least 5 meters. Now, move them no more that 1 cm in any direction. Do you see any issue in tracking their movement? Of course not!!! Therefore, it appears reasonable to assume that the continue moving. If my assumption is wrong, then there is no real problem to be solved. Would you agree with that? – FDavidov May 02 '18 at 06:20
  • 1
    @FDavidov - Your "Of course not!!!" is exactly what OP is asking about. The issue is precisely to match corresponding points after a small perturbation of the point set. This is, in fact, a real problem because points may have disappeared, new points may have appeared (presumably relatively far from existing points), and remaining points may not be at the same index in the two arrays. OP is asking for an algorithm to deal with that particular scenario. It is not trivial to solve this efficiently, particularly if it involves a large numbers of points. – Ted Hopp May 02 '18 at 06:36
  • sounds like a **RANSAC** is what you look for ... also this might help: [Extra stars created, most efficient way to get rid of them?](https://stackoverflow.com/a/42053489/2521214) – Spektre May 02 '18 at 07:36
  • Presumably the points are measurements of something. Without the details of the measurement and what the somethings are and some details of how they behave, it's hard to answer. The answer will be quite different if for example you're measuring static points identified with feature detection but a moving camera (so measurement will be inaccurate, and the points will be expected to move only relative to a camera) or if you've got perfect (or near-perfect) gps locations of people walking around, and want to track individual people. – Paul Hankin May 02 '18 at 08:27
  • I think the OP should try to provide reasonable simplification assumptions; for instance: Is it reasonable to assume that the points are all moving in the same general direction, at a nearly shared velocity (some sort of flow)? If not, can it be assumed that they do so locally (some sort of swirl)? – Reblochon Masque May 02 '18 at 08:45

3 Answers3

1

This rests on one assumption: The shift distance is very small (essentially a fuzzy measurement) compared to the distance between unique points between the first and second set.

First off, the general structure of a point set is not significantly affected by translation, rotation or scaling. This provides you with quite a few options.

Take the min/max for each dimension(x,y,z,etc). Translate and rescale the two point sets. The exact scaling doesn't matter, but you might go with it such that all points are positive and between 0 and 100 in every dimension. This allows you to compare the points more consistently. Although it may not be strictly needed and can likely be skipped

Then you should create a bidirectional mapping (a bidirectional graph) between point set A and point set B, which would be O(|A| + |B|) where |A| and |B| are the sizes of the sets. Example of bidirectional mapping: a_to_b[(1.001,2.001)] = [(1.005,1.995)] b_to_a[(1.005,1.995)] = [(1.001,2.001)]

If a_to_b and b_to_a map to each other, then that is the same point with relatively high probability.

If not, then you likely see something like this instead: a_to_b[(1.001,2.007)] = [(1.005,1.995)] b_to_a[(1.005,1.995)] = [(1.500, 2.004)]

a_to_b[(1.500, 2.004)] = [(1.495, 2.009)] b_to_a[(1.495, 2.009)] = [(1.500, 2.004)]

Since there is no longer a 1-1 mapping, it means something has been added or removed. Since the value in a was not mapped back, it was likely removed. In the opposite occurred, it was likely added. If it was added, you'll want to re-run the algorithm to try to determine what the original closest point was.

This can be verified by looking at the different point and seeing if it is part of a 1-1 mapping (and thus accounted for). Basically, you want to account for all of the 1-1 mapping points (which have high probabilities of being the same point), then try to sort out the points that don't match neatly

You may want to get the Delaunay triangulation for both point sets to allow for looking up the nearest neighbor of all points a lot faster due to knowing which points are spatially adjacent to a given point. The number of edges in a Delaunay graph if I recall right is O(V), so the average edge per vertices is O(1). Once you have found the nearest point. You may however need to do some tweaking of delaunary graphs to account for added/removed edges.

Nuclearman
  • 5,029
  • 1
  • 19
  • 35
1

Match each point to it's nearest neighbor, unless the distance exceeds a threshold.

If you have more than one possible match, you need to devise a good resolution strategy.

Consider unmatched points as removed or added.

To make this faster, put a octree or a grid file on your data, so that you only need to test adjacent grid cells, rather than comparing every point with every other point.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

Under the assumptions that:

  1. Points may shift at random angle and length,
  2. Points may randomly disappear,
  3. Points may randomly be added and at random locations,
  4. Points (in each array) are defined solely by their coordinates with no ID of any kind.

If all the above are correct, there is no solution that would provide a reasonable level of reliability.

I can add many examples that will substantiate my assertion, but it looks quite intuitive and hence are not really needed.

EDIT:

Following a discussion with Ted Hopp, I'm including an alternative approach based on two INSERTED CRITICAL assumptions:

  1. The points move once and only once,
  2. It is possible to state that the minimal initial distance between any two points is known, let's call it Lmin while the maximal distance of any movement is << LMin and we'll call it Mmax.

With these two additional assumptions, you can think of a mechanism as follows (JavaScript-like code - not checked!):

for (i = 0 ; i < Points.Count ; i++) {
    for (j = i + 1 ; j < Points.Count ; j++) {

        if ((ABS(Array1[i].x - Array2[j].x) > Mmax ) ||
            (ABS(Array1[i].y - Array2[j].y) > Mmax ) ||
            (ABS(Array1[i].z - Array2[j].z) > Mmax )    ) {
           // Distance between two points is for sure equal or bigger than max.
           continue ; // Meaning, go to check next point.
        }

        // The check of the distance is split into two stages
        // because, if the first if is true, the actual distance
        // calculation is not needed (and hence time is saved).

        if (Distance_Between_Points(Array1[i],Array2[j]) > Mmax) {
           // Distance between two points is for sure bigger than max.
           continue ; // Meaning, go to check next point.
        }

        // Points appear to be related!!!!!!
        ..........
    }
}
FDavidov
  • 3,505
  • 6
  • 23
  • 59
  • A few corrections, points are most likely to appear a disappear near boundaries of the 3D space (although they could disappear from the middle as well). Points can move at random length and angle, but will always move very small amounts compared to relative distance between unique points. I know this task is possible theoretically (one could do it intuitively by looking at two representations of the points - I just need a reasonably efficient and accurate algorithm to do the same) – abagshaw May 02 '18 at 05:23
  • This clarification does not change much the presented conclusion. Imagine two points that slowly moved towards each other's direction. At some step, both points disappear and a new point appears at the middle of the location of the two disappearing points. The algorithm will **_erroneously_** link the new point to (at least) one of the two original points. Again, as stated, just pick a random number between 1 and 1000 and I'll give you that number of examples that would make the algorithm unreliable. The only possibility is to add some constraints to the rules. – FDavidov May 02 '18 at 05:33
  • One more thing: Before trying to find a solution in 3D, see if you can find one in a single dimension. If the rules are the same, you will still have no reliable solution unless, for instance, points have some _property_ that would allow distinction (e.g. ID). – FDavidov May 02 '18 at 05:36
  • Rather than making assumptions that lead to the conclusion that there is no solution, a more productive approach would be to introduce assumptions (problem constraints) that allow for a solution. – Ted Hopp May 02 '18 at 05:57
  • @TedHopp, I agree and hence the last sentence of my previous comment. It is for the OP to add constraint, not me (or anyone else) since only the OP knows exactly the details of the problem. Additionally, suppose you are trying to solve an unsolvable problem, wouldn't you like that someone tells you that you are wasting your time? A solid "NO" is sometimes the best possible answer one can hope for. – FDavidov May 02 '18 at 06:05
  • 1
    Your 'solid "NO"' is for a problem that OP didn't ask about. – Ted Hopp May 02 '18 at 06:10
  • @TedHopp. Please read my response to your last comment under the original question. Also, I'll be happy to change or reject my own answer after the OP clearly states that I misunderstood the question or changes the conditions. – FDavidov May 02 '18 at 06:13