5

Given m 4-dimensional points, what is the efficient way to find out the two points that have the maximum Euclidean distance?

Currently, I am just using brute force approach and checking every pair distance with 2 nested for loops (O(m^2)) but this is very bad as it does not scale.

user3243499
  • 2,953
  • 6
  • 33
  • 75
  • Possible duplicate of [Algorithm to find points that are furthest apart -- better than O(n^2)?](https://stackoverflow.com/questions/6524350/algorithm-to-find-points-that-are-furthest-apart-better-than-on2) – MSalters Sep 27 '18 at 20:25
  • I am not sure that you can do better than O(m²) for an exact solution. –  Sep 27 '18 at 21:13
  • looks like quadratic term is here in optimal case: https://www.cs.princeton.edu/~chazelle/pubs/ConvexHullAlgorithm.pdf – Severin Pappadeux Sep 27 '18 at 21:56
  • Btw, here's a discussion of the efficiency of using triangle inequality to reduce the number of distance checks in varying dimensionalities: https://stackoverflow.com/questions/35923497/biggest-diameter-of-a-set-with-a-distance-function/35930703#35930703 – m69's been on strike for years Sep 27 '18 at 22:18
  • 1
    @m69 are you sure the triangle inequality works in 4 dimensions? What's a four-dimensional triangle? – גלעד ברקן Sep 28 '18 at 14:13
  • @גלעדברקן {a:(0,0,0,0),b:(1,2,3,4),c:(5,6,7,8)} is a 4D triangle; distance |ab| = sqrt(30) = 5.48, |bc| = sqrt(64) = 8, therefore |ac| will be smaller than or equal to sqrt(30)+sqrt(64) = 13.48, and it is: |ac| = sqrt(174) = 13.19 < 13.48 – m69's been on strike for years Sep 28 '18 at 17:55
  • @גלעדברקן Triangle inequality in higher dimensions: https://people.math.gatech.edu/~cain/notes/cal5.pdf – m69's been on strike for years Sep 28 '18 at 18:23
  • 1
    @m69 it says the inequality holds in higher dimensions! – גלעד ברקן Sep 28 '18 at 19:33
  • @גלעדברקן Indeed. Triangle inequality becomes less useful the higher the number of dimensions, but in 4D it still helps you find the longest distance between 1000 points by measuring the distance between only 2.5% of the pairs. However, this is an improvement for average case with random data, not worst case with any data. – m69's been on strike for years Sep 28 '18 at 19:41
  • Before attempting a 4-D, consider a 2-D [Quadtree](https://en.wikipedia.org/wiki/Quadtree) or a 3-D [Octree](https://en.wikipedia.org/wiki/Octree) of the data. Finding the max distance within an octree is then a small step to 4-D. – chux - Reinstate Monica Aug 28 '20 at 01:51

2 Answers2

0

The problem computation scales with the dimensionality. At about 4 you're usually better off with brute force.

If there's some known functionality for this data you can cut down on things. Like if you do this a lot but the points don't change much. You can build the grouping by checking each point for the farlest point each time you add a new point caching the data from the brute force. You'd get O(N) on insertion and O(N) on farlest query. But, you'd need to do that N times giving you O(N^2).

You could reduce this a bit if you also clustered the data. So you define a cluster of points during the insertion and you can determine that since your house is in New York that no house in Paris can be further since you've compared it to a house in Australia. You can do that because you have the data in clusters. But, that's not going to save you that much because in 4D things get really hard to optimize because you end up needing way more boxes to store the clusters in 4D and most of the fun optimization is a proof that since you already exceed that distance in 4D you can rule out all the other points. That's great in 2D but those tricks become progressively messier with new dimensions.

Tatarize
  • 10,238
  • 4
  • 58
  • 64
-1

Please look at the answer to this question: How to find two most distant points? To find the convex hull you can use this: https://en.wikipedia.org/wiki/Gift_wrapping_algorithm

Peter Chikov
  • 172
  • 3
  • 16
  • Unfortunately, finding the extreme points takes O(n²) in dimension 4. –  Sep 27 '18 at 21:10
  • Those links relate to 2D, they are not relevant. Unfortunately, finding the extreme points takes O(n²) in dimension 4. –  Sep 27 '18 at 21:16
  • This algorithm can be generalized to the 3rd dimension and higher. For visualization of the algorithm in 3rd dimension please look here: https://dccg.upc.edu/people/vera/wp-content/uploads/2014/11/GA2014-ConvexHulls3D-Roger-Hernando.pdf – Peter Chikov Sep 28 '18 at 14:54
  • What use is it if it doesn't lower the complexity ? –  Sep 28 '18 at 18:45
  • The brute force is takes n^2 in every case. The Gift Wrapping algorithm take nF time where F is the number of Convex Hull faces. Worst case it's indeed n^2 if all the points are on the face of the Convex Hull. But in average case the run time is smaller. – Peter Chikov Sep 28 '18 at 19:38
  • 1
    @PeterChikov Triangle inequality also improves average case, but not worst case. Clustering would too, probably. – m69's been on strike for years Sep 28 '18 at 19:46
  • @m69 Agreed. I was not able to find an algorithm that improves the worst case. Your suggestions of Triangle inequality and clustering are good, in the end the choice of algorithm will depend on the layout of the points in a particular problem. – Peter Chikov Sep 28 '18 at 20:16