5

I have:
- a set of points of known size (in my case, only 6 points)
- a line characterized by x = s + t * r, where x, s and r are 3D vectors

I need to find the point closest to the given line. The actual distance does not matter to me.

I had a look at several different questions that seem related (including this one) and know how to solve this on paper from my highschool math classes. But I cannot find a solution without calculating every distance, and I am sure there has to be a better/faster way. Performance is absolutely crucial in my application.

One more thing: All numbers are integers (coordinates of points and elements of s and r vectors). Again, for performance reasons I would like to keep the floating-point math to a minimum.

Rodrigo de Azevedo
  • 1,097
  • 9
  • 17
K. Krull
  • 185
  • 2
  • 8
  • 4
    with only 6 points `O(n)` or `O(log n)` would be similar anyway. – Jarod42 Jun 27 '19 at 12:33
  • possible optimizations would probably involve octrees, but only makes sense when n is large; you can maybe try to vectorize your code instead, thus still computing all distances but with fewer instructions – amlucas Jun 27 '19 at 12:39
  • 3
    You have to process every point at least once to know their distance. Unless you want to repeat the process many times with different lines, simply computing the distance of every point is unavoidable. So the algorithm has to be O(n). From this point, the question is now "what is the fastest way to compute the distance from a point to a line?" – Gilles-Philippe Paillé Jun 27 '19 at 12:39
  • 1
    @amlucas This would make the algorithm O(nlogn) though. It would indeed make sense if the process has to be repeated many times with the same set of points. – Gilles-Philippe Paillé Jun 27 '19 at 12:42
  • How do the line and points come into existence? Are you able to transform the coordinate system such that the line is the z-axis? Then it's a matter of returning the point with the smallest x * x + y * y. – Bathsheba Jun 27 '19 at 12:43
  • I do not see how you can gain from trying to optimise the described case really. Why not make it simple and clear ? – Ely Jun 27 '19 at 12:45
  • 3
    With 6 points there is no point in trying to optimize. Just stick to O(n). –  Jun 27 '19 at 12:49
  • 2
    With six points everything is O(1). O(whatever) is about how an algorithm **scales** as you increase the problem size. For a fixed size input, the best algorithm is the one that is fastest, which often isn't the one that scales better. – Pete Becker Jun 27 '19 at 12:59
  • Gilles' answer is best, on the assumption that this actually is your bottleneck. [*This is how I find out for sure.*](https://stackoverflow.com/a/378024/23771) – Mike Dunlavey Jun 27 '19 at 15:21

2 Answers2

5

You have to process every point at least once to know their distance. Unless you want to repeat the process many times with different lines, simply computing the distance of every point is unavoidable. So the algorithm has to be O(n).

Since you don't care about the actual distance, we can make some simplification to the point-distance computation. The exact distance is computed by (source):

d^2 = |r⨯(p-s)|^2 / |r|^2

where is the cross product and |r|^2 is the squared length of vector r. Since |r|^2 is constant for all points, we can omit it from the distance computation without changing result:

d^2 = |r⨯(p-s)|^2

Compare the approximated square distances and keep the minimum. The advantage of this formula is that you can do everything with integers since you mentioned that all coordinates are integers.

  • In practice, minimizing `d^2` or minimizing `d^2 * abs(r)^2` will lead to the same result. You don't need to normalize `r`! – Damien Jun 27 '19 at 13:07
  • 1
    @Damien Oh you're right! I didn't think about this. I'll edit my answer. Thanks! – Gilles-Philippe Paillé Jun 27 '19 at 13:08
  • That kind of optimization was exactly what I was looking for, thanks! Could one go even one step further and omit the square in your last formula, keeping just `d = |r⨯(p-s)|` (of course d is not the actual distance, but still comparable)? – K. Krull Jun 27 '19 at 13:59
  • 1
    @K.Krull Actually `|r⨯(p-s)|^2` is faster to compute than `|r⨯(p-s)|`. For example, for a vector `v=(x,y,z)`, `|v|^2 = x*x+y*y+z*z` while `|v|=sqrt(x*x+y*y+z*z)`. The notation make it looks like it's less computation, but it's not. – Gilles-Philippe Paillé Jun 27 '19 at 14:03
  • Of course, you are right. Completely forgot about it. Thanks for pointing it out, as this is an important detail! Apart from that, I think it is worth noting that this works completely without floating-point operations, which is exactly what I asked for. – K. Krull Jun 27 '19 at 14:06
1

I'm afraid you can't get away with computing less than 6 distances (if you could, at least one point would be left out -- including the nearest one).

See if it makes sense to preprocess: Is the line fixed and the points vary? Consider rotating coordinates to make the line horizontal.

As there are few points, it is doubtful that this is your bottleneck. Measure where the hot spots are, redesign algorithms/data representation, spice up compiler optimization, compile to assembly and bum that. Strictly in that order.

Jon Bentley's "Writing Efficient Programs" (sadly long out of print) and "Programming Pearls" (2nd edition) are full of advise on practical programming.

vonbrand
  • 11,412
  • 8
  • 32
  • 52
  • "I'm afraid you can't get away with computing less than 6 distances" this is not strictly true, algorithms have been invented that group points in certain ways, such that only a smaller set of distances needs to be calculated - see my answer for more information – darune Jun 27 '19 at 13:33
  • 1
    Your proposal of preprocessing data seems very usefull to me, I'll keep that in mind. Furthermore, I should have mentioned that the algorithm will run on a microcontroller and every instruction counts, as I only have a short timeframe of less then 10us. – K. Krull Jun 27 '19 at 13:54