Morton Reverse Encoding for a 3D grid

Question

I have a 3D grid/array say u[nx+2][ny+2][nz+2]. The trailing +2 corresponds to two layers of halo cells in each of the three dimension x,y,z. I have another grid which allows for refinement(using quadtree) hence I have the morton index (or the Z order) of each of the cells.

Lets say without refinement the two grids are alike in the physical reality(except the second code doesnt have halo cells), What I want to find is for a cell q with morton id mid what is the corresponding index i , j and k index in the 3D grid. Basically a decoding of the mid or Z-order to get corresponding i,j,k for u matrix.

Looking for a C solution but general comments in any other programming language is also OK.

For forward encoding I am following the magic bits method as shown in Morton Encoding using different methods

score 2 · Accepted Answer · answered Apr 10 '18 at 11:55

Morton encoding is just interleaving the bits of two or more components.

If we number binary digits in increasing order of significance, so that the least significant binary digit in an unsigned integer is 0 (and binary digit i has value 2ⁱ), then binary digit i in component k of N corresponds to binary digit (i N + k) in the Morton code.

Here are two simple functions to encode and decode three-component Morton codes:

#include <stdlib.h>
#include <inttypes.h>

/* This source is in the public domain. */

/* Morton encoding in binary (components 21-bit: 0..2097151)
                0zyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyxzyx */
#define BITMASK_0000000001000001000001000001000001000001000001000001000001000001 UINT64_C(18300341342965825)
#define BITMASK_0000001000001000001000001000001000001000001000001000001000001000 UINT64_C(146402730743726600)
#define BITMASK_0001000000000000000000000000000000000000000000000000000000000000 UINT64_C(1152921504606846976)
/*              0000000ccc0000cc0000cc0000cc0000cc0000cc0000cc0000cc0000cc0000cc */
#define BITMASK_0000000000000011000000000011000000000011000000000011000000000011 UINT64_C(844631138906115)
#define BITMASK_0000000111000000000011000000000011000000000011000000000011000000 UINT64_C(126113986927919296)
/*              00000000000ccccc00000000cccc00000000cccc00000000cccc00000000cccc */
#define BITMASK_0000000000000000000000000000000000001111000000000000000000001111 UINT64_C(251658255)
#define BITMASK_0000000000000000000000001111000000000000000000001111000000000000 UINT64_C(1030792212480)
#define BITMASK_0000000000011111000000000000000000000000000000000000000000000000 UINT64_C(8725724278030336)
/*              000000000000000000000000000ccccccccccccc0000000000000000cccccccc */
#define BITMASK_0000000000000000000000000000000000000000000000000000000011111111 UINT64_C(255)
#define BITMASK_0000000000000000000000000001111111111111000000000000000000000000 UINT64_C(137422176256)
/*                                                         ccccccccccccccccccccc */
#define BITMASK_21BITS  UINT64_C(2097151)


static inline void morton_decode(uint64_t m, uint32_t *xto, uint32_t *yto, uint32_t *zto)
{
    const uint64_t  mask0 = BITMASK_0000000001000001000001000001000001000001000001000001000001000001,
                    mask1 = BITMASK_0000001000001000001000001000001000001000001000001000001000001000,
                    mask2 = BITMASK_0001000000000000000000000000000000000000000000000000000000000000,
                    mask3 = BITMASK_0000000000000011000000000011000000000011000000000011000000000011,
                    mask4 = BITMASK_0000000111000000000011000000000011000000000011000000000011000000,
                    mask5 = BITMASK_0000000000000000000000000000000000001111000000000000000000001111,
                    mask6 = BITMASK_0000000000000000000000001111000000000000000000001111000000000000,
                    mask7 = BITMASK_0000000000011111000000000000000000000000000000000000000000000000,
                    mask8 = BITMASK_0000000000000000000000000000000000000000000000000000000011111111,
                    mask9 = BITMASK_0000000000000000000000000001111111111111000000000000000000000000;
    uint64_t  x = m,
              y = m >> 1,
              z = m >> 2;

    /* 000c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c */
    x = (x & mask0) | ((x & mask1) >> 2) | ((x & mask2) >> 4);
    y = (y & mask0) | ((y & mask1) >> 2) | ((y & mask2) >> 4);
    z = (z & mask0) | ((z & mask1) >> 2) | ((z & mask2) >> 4);
    /* 0000000ccc0000cc0000cc0000cc0000cc0000cc0000cc0000cc0000cc0000cc */
    x = (x & mask3) | ((x & mask4) >> 4);
    y = (y & mask3) | ((y & mask4) >> 4);
    z = (z & mask3) | ((z & mask4) >> 4);
    /* 00000000000ccccc00000000cccc00000000cccc00000000cccc00000000cccc */
    x = (x & mask5) | ((x & mask6) >> 8) | ((x & mask7) >> 16);
    y = (y & mask5) | ((y & mask6) >> 8) | ((y & mask7) >> 16);
    z = (z & mask5) | ((z & mask6) >> 8) | ((z & mask7) >> 16);
    /* 000000000000000000000000000ccccccccccccc0000000000000000cccccccc */
    x = (x & mask8) | ((x & mask9) >> 16);
    y = (y & mask8) | ((y & mask9) >> 16);
    z = (z & mask8) | ((z & mask9) >> 16);
    /* 0000000000000000000000000000000000000000000ccccccccccccccccccccc */
    if (xto) *xto = x;
    if (yto) *yto = y;
    if (zto) *zto = z;
}


static inline uint64_t morton_encode(uint32_t xsrc, uint32_t ysrc, uint32_t zsrc)
{
    const uint64_t  mask0 = BITMASK_0000000001000001000001000001000001000001000001000001000001000001,
                    mask1 = BITMASK_0000001000001000001000001000001000001000001000001000001000001000,
                    mask2 = BITMASK_0001000000000000000000000000000000000000000000000000000000000000,
                    mask3 = BITMASK_0000000000000011000000000011000000000011000000000011000000000011,
                    mask4 = BITMASK_0000000111000000000011000000000011000000000011000000000011000000,
                    mask5 = BITMASK_0000000000000000000000000000000000001111000000000000000000001111,
                    mask6 = BITMASK_0000000000000000000000001111000000000000000000001111000000000000,
                    mask7 = BITMASK_0000000000011111000000000000000000000000000000000000000000000000,
                    mask8 = BITMASK_0000000000000000000000000000000000000000000000000000000011111111,
                    mask9 = BITMASK_0000000000000000000000000001111111111111000000000000000000000000;
    uint64_t  x = xsrc,
              y = ysrc,
              z = zsrc;
    /* 0000000000000000000000000000000000000000000ccccccccccccccccccccc */
    x = (x & mask8) | ((x << 16) & mask9);
    y = (y & mask8) | ((y << 16) & mask9);
    z = (z & mask8) | ((z << 16) & mask9);
    /* 000000000000000000000000000ccccccccccccc0000000000000000cccccccc */
    x = (x & mask5) | ((x << 8) & mask6) | ((x << 16) & mask7);
    y = (y & mask5) | ((y << 8) & mask6) | ((y << 16) & mask7);
    z = (z & mask5) | ((z << 8) & mask6) | ((z << 16) & mask7);
    /* 00000000000ccccc00000000cccc00000000cccc00000000cccc00000000cccc */
    x = (x & mask3) | ((x << 4) & mask4);
    y = (y & mask3) | ((y << 4) & mask4);
    z = (z & mask3) | ((z << 4) & mask4);
    /* 0000000ccc0000cc0000cc0000cc0000cc0000cc0000cc0000cc0000cc0000cc */
    x = (x & mask0) | ((x << 2) & mask1) | ((x << 4) & mask2);
    y = (y & mask0) | ((y << 2) & mask1) | ((y << 4) & mask2);
    z = (z & mask0) | ((z << 2) & mask1) | ((z << 4) & mask2);
    /* 000c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c00c */
    return x | (y << 1) | (z << 2);
}

The functions work symmetrically. To decode, binary digits and digit groups are shifted to larger consecutive units; to encode, binary digit groups are split and spread by shifting. Examine the masks (the BITMASK_ constants are named after their binary digit pattern), and the shift operations, to understand in detail how the encoding and decoding happens.

While two functions are quite efficient, they are not optimized.

The above functions have been verified tested to work using a few billion round-trips using random 21-bit unsigned integer components: decoding a Morton-encoded value yields the original three components.

the answer looks fine and i do have a similar solution available elsewhere but what I fail to understand is the rationale behind creating those masks. Well I know about bitmasking, bit manipulation but maybe I am just starting with and hence it is difficult to form the ideas around how you proceed with it. — datapanda, Apr 11 '18 at 07:11
@datapanda: The other option to implement the bit interleaving, would be to move each bit separately. That would require 20 masks and 20 shifts, per component. Instead, every other bit (bit group after the first) are moved at the same time, so that we only need to move six bits or bit groups per component. Masks 0, 3, 5, and 8 stay put; the other select specific bit groups to be shifted. Start with 0x0zyxzyxzyx...zyxzyx, and see how it is affected by each operation. — Nominal Animal, Apr 11 '18 at 07:33
you say in your answer "While two functions are quite efficient, they are not optimized.", how unoptimised are they are to feel any performance issues given that I have to do encoding and decoding of some `400x400x400` integers ? — datapanda, Apr 11 '18 at 08:14
@datapanda: On my Intel Core i5-7200U, a microbenchmark indicated that encoding takes about 37 clock cycles, and decoding about 34. Using a Xorshift64* pseudo-random number generator, I can generate, encode, decode, and verify 400×400×400 triplets in less than two seconds of wall clock time. I think that's plenty fast; no need to spend any more time optimizing it. Now, if you had hundrends of billions of triplets, I might start thinking about optimizing it further; not before. — Nominal Animal, Apr 11 '18 at 11:34
that seems pretty fast. Surely there is no need to spend time on optimization. Just another question my `i,j,k` in the production codes have been declared as `int` instead of `uint_32` which makes no sense since they are never going to be negative values. Does casting a uint_32 to int to access the array create any problems given the code will also be running on HPC clusters with different architectures. Ref. https://stackoverflow.com/questions/19490240/does-cast-between-signed-and-unsigned-int-maintain-exact-bit-pattern-of-variable — datapanda, Apr 11 '18 at 13:06
@datapanda: The cast is okay. When distributing a computation across a cluster, the processes run on the same (or on binary compatible architectures), as otherwise you'd need to have several different binaries working in concert. (If you use MPI, then you definitely will only use nodes with the same (sufficiently binary compatible) architectures.) So, you do not need to worry about byte order or such; you can transfer binary values safely. — Nominal Animal, Apr 11 '18 at 15:38

Morton Reverse Encoding for a 3D grid

1 Answers1