8

If I have a dataset, where the keys are points in 3D, represented by 3 signed 64 bit integers. And I want to use a (sorted)key-value store to store them, where keys are just byte array (but I can specify a comparator). I think I can turn all those points into a byte array by using bit-interleaving, as is done with Z/Morton order like in How to compute a 3D Morton number

In addition to fetching individual points, which can be done more simply without Morton ordering, I want to do range search, where I search within a box that is aligned with the axes. I will define A and B as being respectively the box corners where all coordinates are the lowest, and the opposite corner where all coordinates are the highest.

Now my questions are:

  1. For any point C, which is logically between, A and B, will the Morton number of C also be between the Morton number of A and B? (Isn't that the point of Morton order?)

  2. If 1 is no, can A and B be "rounded" to values that guaranty that C will be included?

  3. Assuming that 1 or 2 is possible, does the search return also points outside of that box, which I have to "post-filter" away? How big is that "error set" (does it depend on the size, or the position, of the search)?

  4. Does the fact that the integers are signed cause problems? And if so, is there a work-around?

To recap, using Morton Numbers is just one possible solution to the real problem: How to efficiently range-search in 3D integer space, when the 3D points have to be mapped to a unidimensional value? I want to get all the points between A and B, by performing a single range select in the DB, using a min-key and a max-key, and ideally, getting as few points outside the box as possible.

Paul R
  • 208,748
  • 37
  • 389
  • 560
Sebastien Diot
  • 7,183
  • 6
  • 43
  • 85
  • Or 5: you can take two morton numbers `(a,b,c)` and `(d,e,f)` and add them cleverly and quickly (so no deinterleaving and interleaving) to get `(a+d,b+e,c+f)`, so you can scan through that box in normal order. Are you interested in that technique as well? – harold Oct 07 '12 at 21:48
  • @harold I don't understand how adding helps, but if it's a solution to the range search, I would be interested. – Sebastien Diot Oct 07 '12 at 21:57
  • Well maybe I don't understand your problem, but you want to search through a box, right? As in, visit all elements in that box and presumably do some sort of test on the element? What I had in mind was the same idea as three nested loops over `x`, `y` and `z` but avoids having to convert all those coordinates to morton numbers by incrementing the fields in the morton coordinate directly. – harold Oct 07 '12 at 22:01
  • OK, I think I'm getting your idea, partly. But I'm not sure it applies. I want to get all the points between A and B, *by performing a single range select* in the DB, using a min-key and a max-key. I think your solution, using a loop, would involve many DB calls, if I understand correctly. – Sebastien Diot Oct 07 '12 at 22:08
  • Oh, yes, I thought we were just talking about some array in RAM here, well that complicates matters – harold Oct 07 '12 at 22:09
  • 1
    Requesting the whole range wouldn't skip anything, but it would give you a whole load of unnecessary data. Really a lot. – harold Oct 07 '12 at 22:18

3 Answers3

7

4) Yes, the sign will cause an issue, but it's trivial to solve.

Just XOR the sign bit of x, y and z with 1 before you create your Morton Number.

Why this works (using 1 dimension signed bytes instead):

-1 in binary is 11111111

0 in binary is 00000000

1 in binary is 00000001

The order you want is -1, 0, 1, but the current binary order is 0, 1, -1.

-1 XOR 10000000 = 01111111

0 XOR 10000000 = 10000000

1 XOR 10000000 = 10000001

Now your binary order is correct

Louis Ricci
  • 20,804
  • 5
  • 48
  • 62
  • That was the last part of the puzzle, if I stick to Morton order. – Sebastien Diot Oct 12 '12 at 17:24
  • OK. I think that my answer and yours together solves all the points of my question. And I'd hate to have a bounty go to waste, so I can give it to you. Assuming you understood my question, and my answer, I'd appreciate if you could at least tell me if it sounds right, as I'm not totally sure, and did not get any feedback on my own answer, not even a +/- vote. – Sebastien Diot Oct 12 '12 at 18:57
  • @Sebastien Diot - I think 1) is "No" referencing the wikipedia article, the 2d example given shows that even within a range you will still need to cull out false positives or calculate intermediate Morton Numbers and query multiple ranges. http://en.wikipedia.org/wiki/Morton_number_%28number_theory%29 – Louis Ricci Oct 12 '12 at 19:35
  • 1
    OK. I checked. I think my theory holds: if you look at the table with the binary representation, and choose two points (x,y) at random, say, (2,5) and (6,3). Then the box is: (2,3),(2,5),(6,5),(6,3) (order is not important). Now you take the "lowest" point, (2,3) as start and the "highest", (6,5) as end. If you follow the curve from (2,3) to (6,5), it will cover every point in the box (along with many others, but that is solved by post-filtering). – Sebastien Diot Oct 12 '12 at 20:40
2

Seems like I have to answer my question myself. And the answer will relate to the z order curve, which is what I actually asked for. This is as far as I get it:

  1. Yes. It always work like that with z order curves.
  2. Is irrelevant, because 1) is true.
  3. It depends on how big the area is, and where it is, but the worst case scenario is when you actually include the centre of the space, rather then when you are far from it. For any dimensionality, if you use indexes of depth N bits in each dimension, there are special cases, where the z order curve give you an exact match, and you only get points in the search box. This happens when the following conditions are met:
  4. The search area is the same size in all dimension, and a power of two, 2^M, where 0 <= M <= N.
  5. The search area is a hypercube aligned with all axis.
  6. All coordinates of the search area hypercube corners are a multiple of 2^M. When all those conditions are met, the search area corresponds exactly to a sub-tree node, and therefore matches exactly. In the general case, though, the best you can do is find the smallest node that will hold all the desired points, and subdivide it recursively in smaller nodes that provide a partial match, to some maximum desired depth, to get a better match at the cost of using multiple queries. Finding the smallest tree node that contains all corners of the area is equivalent to finding the common prefix of the Morton code for all corners. And when using a single query, the amount of points outside the area returned is the volume of that node queried minus the volume of the search area.
  7. I'm pretty sure that's a problem, but I haven't found info on that yet.

It seems what I'm saying isn't clear, so I'll do a little ASCII ART...

A basic Z-order (Morton order) curve in 2D goes like this (path is A,B,C,D):

x  0   1

0  A → B
     ↙
1  C → D

So, A = (0,0) B = (0,1) C = (1,0) D = (1,1)

Now a basic Hilbert curve in 2D goes like this (path is A,B,C,D):

x  0   1

0  A   D
   ↓   ↑
1  B → C

So, A = (0,0) B = (1,0) C = (1,1) D = (0,1)

With a Z-order (Morton order), the curve will start from the lowest point A (0,0), and end with the highest point D (1,1), while covering all the points in between. This makes a range search easy, and it work in 3D too. All the points in a 3D box (0,0,0) to (1,1,1) are between the Morton order code of (0,0,0) and (1,1,1).

With an Hilbert curve, you start at (0,0) and ends at (0,1) (Probably similar in 3D). Therefore, performing the range-query from the Hilbert code of (0,0) to (1,1) will NOT find all points.

Therefore, if I am to use an Hilbert curve to perform my range query of a 3D box (0,0,0) to (1,1,1) (as a single DB query), I need a function that tells me what point should I use as first point, and what point I should use as last point, because (0,0,0) and (1,1,1) will not work.

So, is there such a function, where you give your 8 (for 3D) box coordinates, and it returns the first and last points to use in your range query? Without that, I cannot use Hilbert curves to solve my problem.

Or, in pseudo-SQL:

Morton Query:

SELECT key,data FROM table WHERE (key >= Morton(lowest(box))) AND (key <= Morton(highest(box)))

Hilbert Query:

SELECT key,data FROM table WHERE (key >= Hilbert(XXX(box))) AND (key <= Hilbert(YYY(box)))

And so I need the XXX() and YYY().

[EDIT] A part of this answer was targeted at some other answer telling me to use Hilbert curves, but which has since deleted their answer.

Ian Goldby
  • 5,609
  • 1
  • 45
  • 81
Sebastien Diot
  • 7,183
  • 6
  • 43
  • 85
  • IIUC if A and B are corners of the search area, you take the longest common prefix of morton(A) and morton(B) to find the biggest "z area" that contains the search area and repeat that algorithm through dichotomy, is that correct? – amirouche Oct 06 '15 at 02:39
  • @amirouche That was 3 years ago, and I have long since dropped that project (for reasons unrelated to this question), never to user Morton (or Hilbert) again, so I've forgotten it all, and cannot (competently) answer your question. Sorry. – Sebastien Diot Oct 06 '15 at 14:36
  • 2
    The revelant source is http://www.vision-tools.com/h-tropf/multidimensionalrangequery.pdf via wp; Thanks! – amirouche Oct 06 '15 at 15:56
-1

There is in general not much you can expect from trying to reduce 3-dimensions to 1 dimension. What you could try though: find the major axis (i.e. do a line fit through your points), then project the points onto the line. This gives you requested one-dimensional value for each point. When you project the boxes' corners onto to the line and take enclosing interval of these values, you get the search range.

coproc
  • 6,027
  • 2
  • 20
  • 31