10

How to encode/decode morton codes(z-order) given [x, y] as 32bit unsigned integers producing 64bit morton code, and vice verse ? I do have xy2d and d2xy but only for coordinates that are 16bits wide producing 32bit morton number. Searched a lot in net, but couldn't find. Please help.

Paul R
  • 208,748
  • 37
  • 389
  • 560
Dawid Szymański
  • 775
  • 6
  • 15
  • It's really not hard to extend a 32bit version to 64bit. Double the width of all the masks, and add an extra step following the same pattern as the others. – harold May 29 '15 at 22:16

3 Answers3

17

If it is possible for you to use architecture specific instructions you'll likely be able to accelerate the operation beyond what is possible using bit-twiddeling hacks:

For example if you write code for the Intel Haswell and later CPUs you can use the BMI2 instruction set which contains the pext and pdep instructions. These can (among other great things) be used to build your functions.

Here is a complete example (tested with GCC):

#include <immintrin.h>
#include <stdint.h>

// on GCC, compile with option -mbmi2, requires Haswell or better.

uint64_t xy_to_morton(uint32_t x, uint32_t y)
{
  return _pdep_u32(x, 0x55555555) | _pdep_u32(y,0xaaaaaaaa);
}

void morton_to_xy(uint64_t m, uint32_t *x, uint32_t *y)
{
  *x = _pext_u64(m, 0x5555555555555555);
  *y = _pext_u64(m, 0xaaaaaaaaaaaaaaaa);
}

If you have to support earlier CPUs or the ARM platform not all is lost. You may still get at least get help for the xy_to_morton function from instructions specific for cryptography.

A lot of CPUs have support for carry-less multiplication these days. On ARM that'll be vmul_p8 from the NEON instruction set. On X86 you'll find it as PCLMULQDQ from the CLMUL instruction set (available since 2010).

The trick here is, that a carry-less multiplication of a number with itself will return a bit-pattern that contains the original bits of the argument with zero-bits interleaved. So it is identical to the _pdep_u32(x,0x55555555) shown above. E.g. it turns the following byte:

 +----+----+----+----+----+----+----+----+
 | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
 +----+----+----+----+----+----+----+----+

Into:

 +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+
 | 0  | b7 | 0  | b6 | 0  | b5 | 0  | b4 | 0  | b3 | 0  | b2 | 0  | b1 | 0  | b0 |
 +----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+----+

Now you can build the xy_to_morton function as (here shown for CLMUL instruction set):

#include <wmmintrin.h>
#include <stdint.h>

// on GCC, compile with option -mpclmul

uint64_t carryless_square (uint32_t x)
{
  uint64_t val[2] = {x, 0};
  __m128i *a = (__m128i * )val;
  *a = _mm_clmulepi64_si128 (*a,*a,0);
  return val[0];
}

uint64_t xy_to_morton (uint32_t x, uint32_t y)
{
  return carryless_square(x)|(carryless_square(y) <<1);
}

_mm_clmulepi64_si128 generates a 128 bit result of which we only use the lower 64 bits. So you can even improve upon the version above and use a single _mm_clmulepi64_si128 do do the job.

That is as good as you can get on mainstream platforms (e.g. modern ARM with NEON and x86). Unfortunately I don't know of any trick to speed up the morton_to_xy function using the cryptography instructions and I tried really hard for several month.

Pavel P
  • 15,789
  • 11
  • 79
  • 128
Nils Pipenbrinck
  • 83,631
  • 31
  • 151
  • 221
  • Really great. Appritiate. – Dawid Szymański May 30 '15 at 00:45
  • 1
    @DawidSzymański If you want more I suggest you check out this blog: bitmath.blogspot.de and read about tesseral arithmetic (that's doing arithmetic with numbers stored in morton order without encoding/decoding them). I'm pretty sure you can use it for your space-filling curve stuff. – Nils Pipenbrinck May 30 '15 at 00:52
  • @harold, fun fact is: We have enjoy the mathematical odditiy of the bit-twiddeling powers of the x*x operation in GF(2'm). However, the crypto-folks like to have a fast sqrt(x) in GF(2'm) as well. They already found out about that it is about separating even from odd bits, but they don't know the bit-twiddling hacks yet.. I think everyone can learn from that! – Nils Pipenbrinck May 31 '15 at 20:20
  • @NilsPipenbrinck hitting this answer after so long, inquisitive for the fact whether they exist for a 3D space ? say encoding x,y,z to Z order and vice versa. – datapanda May 07 '18 at 00:20
11
void xy2d_morton(uint64_t x, uint64_t y, uint64_t *d)
{
    x = (x | (x << 16)) & 0x0000FFFF0000FFFF;
    x = (x | (x << 8)) & 0x00FF00FF00FF00FF;
    x = (x | (x << 4)) & 0x0F0F0F0F0F0F0F0F;
    x = (x | (x << 2)) & 0x3333333333333333;
    x = (x | (x << 1)) & 0x5555555555555555;

    y = (y | (y << 16)) & 0x0000FFFF0000FFFF;
    y = (y | (y << 8)) & 0x00FF00FF00FF00FF;
    y = (y | (y << 4)) & 0x0F0F0F0F0F0F0F0F;
    y = (y | (y << 2)) & 0x3333333333333333;
    y = (y | (y << 1)) & 0x5555555555555555;

    *d = x | (y << 1);
}

// morton_1 - extract even bits

uint32_t morton_1(uint64_t x)
{
    x = x & 0x5555555555555555;
    x = (x | (x >> 1))  & 0x3333333333333333;
    x = (x | (x >> 2))  & 0x0F0F0F0F0F0F0F0F;
    x = (x | (x >> 4))  & 0x00FF00FF00FF00FF;
    x = (x | (x >> 8))  & 0x0000FFFF0000FFFF;
    x = (x | (x >> 16)) & 0x00000000FFFFFFFF;
    return (uint32_t)x;
}

void d2xy_morton(uint64_t d, uint64_t &x, uint64_t &y)
{
    x = morton_1(d);
    y = morton_1(d >> 1);
}
Johan
  • 74,508
  • 24
  • 191
  • 319
Dawid Szymański
  • 775
  • 6
  • 15
5

The naïve code would be the same irregardless of the bit count. If you don't need super fast bit twiddling version, this will do

uint32_t x;
uint32_t y;
uint64_t z = 0;

for (int i = 0; i < sizeof(x) * 8; i++)
{
  z |= (x & (uint64_t)1 << i) << i | (y & (uint64_t)1 << i) << (i + 1);
}

If you need faster bit twiddling, then this one should work. Note that x and y have to be 64bit variables.

uint64_t x;
uint64_t y;
uint64_t z = 0;

x = (x | (x << 16)) & 0x0000FFFF0000FFFF;
x = (x | (x << 8)) & 0x00FF00FF00FF00FF;
x = (x | (x << 4)) & 0x0F0F0F0F0F0F0F0F;
x = (x | (x << 2)) & 0x3333333333333333;
x = (x | (x << 1)) & 0x5555555555555555;

y = (y | (y << 16)) & 0x0000FFFF0000FFFF;
y = (y | (y << 8)) & 0x00FF00FF00FF00FF;
y = (y | (y << 4)) & 0x0F0F0F0F0F0F0F0F;
y = (y | (y << 2)) & 0x3333333333333333;
y = (y | (y << 1)) & 0x5555555555555555;

z = x | (y << 1);
Sami Kuhmonen
  • 30,146
  • 9
  • 61
  • 74