2

I have a collection of Million points in 3d space.

Each point is an object

Struct Point
{
    double x;
    double y;
    double z;
};

The million points are stored inside an c++ vector MyPoints in some random order.

I want to sort these million points according to spatial distribution of points in space such that points which are physically closer should also be closer inside my array after sorting.

My first guess on how to do this is as follows: first sort points w.r.t Z-axis, then sort points along Y-axis and then sort points along X-axis

MyPointsSortedAlongZ = Sort(MyPoints, AlongZAxis )
MyPointsSortedAlongY = Sort(MyPointsSortedAlongZ , AlongYAxis  )
MyPointsSortedAlongX = Sort(MyPointsSortedAlongY , AlongYAxis  )

Now firstly, I dont know if this method is correct. Will my final array of points MyPointsSortedAlongX be sorted perfectly spatially (or nearly sorted spatially) ?

Secondly, if this method is correct, is it the fastest way to do this. What is a better method to do this ?

nurabha
  • 1,152
  • 3
  • 18
  • 42
  • 5
    There is no way to do this and get a perfect mapping of spatial distance to linear (1D array) distance, because you're reducing the dimensionality. You may be interested in [*space-filling curves*](http://en.wikipedia.org/wiki/Space_filling_curve) as a means to approximately achieve this. – Oliver Charlesworth Apr 01 '14 at 08:09
  • 1
    Another approach would be to discretise your space into a 3d array and to store, in each cell, either the points in that cell or some reference to them. Personally I like @OliCharlesworth's suggestion to use a space-filling curve better, but they can be quite tricky to program. – High Performance Mark Apr 01 '14 at 08:13
  • I didnt mention space filling curves in my post although I am aware about it. I am thinking whether constructing a Barnes-Hut Tree and then doing an in-order traversal of that tree can result in spatial sorting(but maybe I am wrong). – nurabha Apr 01 '14 at 08:22
  • Ok, after few hours of reading I think Space filling curve with hilbert characteristic function is what I need. Its a quite complicated geometric algorithm. – nurabha Apr 01 '14 at 13:32
  • The space is divided into fixed number of cells depending upon dimensions. The direction of curve itself is geometrical interpretation of linear ordering of the cells of the space. The spatial sorting of points is achieved by associating cell with a point and then simply sorting the points according to cell index will generate the spatial ordering of points. The characteristic function impacts the way the ordering is generated. Hilbert, Z-curve, gray code are some important mapping functions – nurabha Apr 02 '14 at 05:58
  • 1
    If you haven't implemented your solution yet, have you seen this related SO post about Morton ordering? http://stackoverflow.com/questions/1024754/how-to-compute-a-3d-morton-number-interleave-the-bits-of-3-ints – Rethunk Apr 12 '14 at 22:29
  • 1
    @Rethunk: I did look at that solution. I have come up with even better implementation that works with variable sized morton code. I will share it when its working properly – nurabha May 08 '14 at 06:47
  • Cool! Very interested to see your work. – Rethunk May 09 '14 at 21:41

4 Answers4

5

The CGAL library provides an implementation of a space filling curve algorithm that can be useful for that task.

sloriot
  • 6,070
  • 18
  • 27
0

Well, it really depends on what the metric you are going to use to compare between two arrays, but look for example on the metric which is sum of differences between adjacent points:

metric(arr) = sum[ d(arr[i],arr[i-1]) | i from 1 to n ]
where d(x,y) is the distance between point x and point y

Note that an optimal (smallest) solution to this metric is basically an optimal (shortest) path that goes through all points. This is the Traveling Salesman Problem (TSP), which is NP-Hard, so there is no known polynomial solution to it.

I'd suggest - first define exactly what is the metric to compare two arrays.
Then, use heuristics or approximations to the metric, such as Genetic Algorithms or hill climbing, or reduce the problem to TSP, and use a known heuristic/approximation for it.

Regarding your method: It is easy to see it is not optimal for the simple example:

[(1,100),(1,-100),(2,0)]

Let's assume main sort by x, secondary sort by y.
It will give us the 'sorted' vector:

[(1,-100),(1,100),(2,0)]

according to the above metric, we get metric(arr) ~= 300

However, the order [(1,-100),(2,0),(1,100)] will get us metric(arr) ~= 200

So, the suggested heuristic is not optimal (as expected).

amit
  • 175,853
  • 27
  • 231
  • 333
  • I need approximate sorting and really fast sorting. So a metric which has minimum impact of performance is what I need. – nurabha Apr 01 '14 at 09:06
0

Maybe this helps: A Template for the Nearest Neighbor Problem (DDJ 2001)

RED SOFT ADAIR
  • 12,032
  • 10
  • 54
  • 92
  • This is NOT Nearest Neighbor Problem. NN is about finding a point from a sample which is closest to a single point. In here - the problem is about optimization ordering among many points. If you think the problem is somehow related, please explain how you can reduce the problem to use NN, and then link to your article explaining how to use the reduced problem. – amit Apr 01 '14 at 08:41
  • Was just a guess - sorry. – RED SOFT ADAIR Apr 01 '14 at 14:37
0

Sorting three times on the three axis is a waste. The third sort will completely undo what the other sorts have done.

  • You are not telling us the motivation behind "points which are physically closer should also be closer inside my array". Knowing that would probably allow us to provide more constructive answers. Space filling curves do not have this property at all ! –  Apr 02 '14 at 12:32
  • I think you are not right about Space filling curves. There are 3 main motivations behind my statement: 1. Improving memory locality: spatially close points in memory means higher spatial locality...which means better utilization of caches on cpus...gpus it will improve memory coalescing. 2. Ease of domain decomposition: Usually some computation would be associated with points and by spatially sorting the points it becomes easy to split computations across threads on cpus/gpus. 3.Improving Load balancing: This is another factor which may improve for certain parallel algorithms. – nurabha Apr 02 '14 at 16:38
  • Take a look at a picture of a Hilbert curve and consider the four points in the middle. Two of them are immediate neighbors. The other two are separated by length/4 or length/2, i.e. about the worse you can find. –  Apr 02 '14 at 17:53
  • I'd better rely on k-means grouping of the points, where k depends on your application. –  Apr 02 '14 at 18:05