Efficient z-order transformation in Fortran

Question

For my current work on a grid generation algorithm I need an efficient way to transform three-dimensional coordinates to z-order (more precisely: three 4-Byte integers into one 8-Byte integer) and the other way round. This Wikipedia article describes it fairly well: Z-order curve. Since I am not a programmer, the solution I came up with does what it is supposed to do but might be quite naive using the mvbits intrinsic to do the bit interleaving explicitly:

SUBROUTINE pos_to_z(i, j, k, zval)

use types

INTEGER(I4B), INTENT(IN)  :: i, j, k
INTEGER(I8B), INTENT(OUT) :: zval
INTEGER(I8B) :: i8, j8, k8
INTEGER(I4B) :: b

zval = 0
i8 = i-1
j8 = j-1
k8 = k-1

do b=0, 19
    call mvbits(i8,b,1,zval,3*b+2)
    call mvbits(j8,b,1,zval,3*b+1)
    call mvbits(k8,b,1,zval,3*b  )
end do

zval = zval+1

END SUBROUTINE pos_to_z


SUBROUTINE z_to_pos(zval, i, j, k)

use types

INTEGER(I8B), INTENT(IN)  :: zval
INTEGER(I4B), INTENT(OUT) :: i, j, k
INTEGER(I8B) :: i8, j8, k8, z_order
INTEGER(I4B) :: b

z_order = zval-1
i8 = 0
j8 = 0
k8 = 0

do b=0, 19
    call mvbits(z_order,3*b+2,1,i8,b)
    call mvbits(z_order,3*b+1,1,j8,b)
    call mvbits(z_order,3*b  ,1,k8,b)
end do

i = int(i8,kind=I4B) + 1
j = int(j8,kind=I4B) + 1
k = int(k8,kind=I4B) + 1

END SUBROUTINE z_to_pos

Note that I prefer the input and output ranges to start with 1 instead of 0, leading to some additional calculations. As it turns out, this implementation is rather slow. I measured the time it takes to transform and re-transform 10^7 positions:
gfortran -O0: 6.2340 seconds
gfortran -O3: 5.1564 seconds
ifort -O0: 4.2058 seconds
ifort -O3: 0.9793 seconds

I also tried different optimization options for gfortran without success. While the optimized code with ifort is already a lot faster, it is still the bottleneck of my program. It would be really helpful if someone could point me into the right direction how to do the bit interleaving more efficiently in Fortran.

Is there a reason you are doing this on a bit-wise level rather than e.g. storing the integer coordinates directly? — Alexander Vogt, May 25 '15 at 10:52
One of the main purposes of these two subroutines is neighbor-finding in a sparsely populated cartesian grid. I dont see how storing the integer coordinates would help me with that. could you be more specific? — MechEng, May 25 '15 at 11:00

samgak · Accepted Answer · 2015-05-26T06:34:12.487

Transforming from 3 co-ords to a z-order can be optimized using a look up table similar to the one described here. Since you only use 20 bits of your input values, it would be more efficient to use a look-up table with 1024 entries rather than 256, enough to index 10 bits, so that you only have to do 2 look-ups for each of your 3 input values, and modified for the case of interleaving 3 values instead of 2.

Entry n of the array stores the integer n, with its bits spread out so that bit 0 is in bit 0, bit 1 is moved to bit 3, bit 2 is moved to bit 6 and so on, with all the remaining bits set to zero. The lookup table array can be initialized like this:

subroutine init_morton_table(morton_table)
    integer(kind=8), dimension (0:1023), intent (out) :: morton_table
    integer :: b, v, z
    do v=0, 1023
        z = 0
        do b=0, 9
            call mvbits(v,b,1,z,3*b)
        end do
        morton_table(v) = z
    end do
end subroutine init_morton_table

To actually interleave the values, separate your 3 input values into their low 10 bits and their high 10 bits, then use these 6 values as indices into the array and combine the looked-up values using shifts and adds to interleave the values together. Adds are equivalent to bitwise OR operations in this case because there won't be any carries given that at most one bit will be set in each bit position. Because only every 3rd bit can be set in the values in the tables, offsetting one of the values by 1 bit and another by 2 means there won't be any collisions.

subroutine pos_to_z(i, j, k, zval, morton_table)
    integer, intent(in) :: i, j, k
    integer(kind=8), dimension (0:1023), intent (in) :: morton_table
    integer(kind=8), intent (out) :: zval
    integer(kind=8) :: z, i8, j8, k8

    i8 = i-1
    j8 = j-1
    k8 = k-1

    z = morton_table(iand(k8, 1023))
    z = z + ishft(morton_table(iand(j8, 1023)),1)
    z = z + ishft(morton_table(iand(i8, 1023)),2)
    z = z + ishft(morton_table(iand(ishft(k8,-10), 1023)),30)
    z = z + ishft(morton_table(iand(ishft(j8,-10), 1023)),31)
    zval = z + ishft(morton_table(iand(ishft(i8,-10), 1023)),32) + 1

end subroutine pos_to_z

You can use a similar technique to go the other way, but I don't think it will be quite as efficient. Create a lookup table of 32768 values (15 bits) that store 5 bits of the reconstructed input value. You will have to do 12 look-ups, getting 5 bits at a time for each of your three 20-bit values. Mask off the bottom 15 bits, then shift right 0, 1 and 2 bits to get your look-up indexes for k, j and i. Then shift and mask to get bits 15-29, 30-44 and 45-59 and do the same each time, shifting and adding to reconstruct k, j and i.

Thanks, that is indeed way faster. Thinking through my algorithm again, there is no need to do the inverse transformation (z-order to position) anyway. I guess I should read up on bitwise operations, but for now my problem is solved. — MechEng, May 26 '15 at 16:21
Lets assume I want to use the full positive range of the 8-Byte z-order. What I have to do is the following: 1) drop the shift +1 at the end of the pos_to_z routine 2) pre-calculate the morton table for 2048 values instead 3) change the values "-10" to "-11" and shift by 33,34,35 instead of 30,31,32. If storing the 2048 values for the morton table is not an issue, I can stick with only 2 shifts per coordinate, am I right? — MechEng, Jun 10 '15 at 15:23
Right now you have 3 input values and you use 20 bits from each to create a 60 bit value that you store in 8 bytes. You can't use the full 8 bytes because 64 bits is not divisible by 3. But if you want to store 21 bits from each value in 63 bit then what you are saying sounds right. You'd do the bottom 10 bits as now, and then do the top 11 bits. In addition to what you've described, you'll need to change the iand mask value from 1023 to 2047 — samgak, Jun 10 '15 at 23:25
By "full positive range" I meant in fact only 63 Bits of the z-order variable. Thanks for the clarification. — MechEng, Jun 11 '15 at 08:07

Efficient z-order transformation in Fortran

1 Answers1

Linked