For my current work on a grid generation algorithm I need an efficient way to transform three-dimensional coordinates to z-order (more precisely: three 4-Byte integers into one 8-Byte integer) and the other way round. This Wikipedia article describes it fairly well: Z-order curve. Since I am not a programmer, the solution I came up with does what it is supposed to do but might be quite naive using the mvbits intrinsic to do the bit interleaving explicitly:
SUBROUTINE pos_to_z(i, j, k, zval)
use types
INTEGER(I4B), INTENT(IN) :: i, j, k
INTEGER(I8B), INTENT(OUT) :: zval
INTEGER(I8B) :: i8, j8, k8
INTEGER(I4B) :: b
zval = 0
i8 = i-1
j8 = j-1
k8 = k-1
do b=0, 19
call mvbits(i8,b,1,zval,3*b+2)
call mvbits(j8,b,1,zval,3*b+1)
call mvbits(k8,b,1,zval,3*b )
end do
zval = zval+1
END SUBROUTINE pos_to_z
SUBROUTINE z_to_pos(zval, i, j, k)
use types
INTEGER(I8B), INTENT(IN) :: zval
INTEGER(I4B), INTENT(OUT) :: i, j, k
INTEGER(I8B) :: i8, j8, k8, z_order
INTEGER(I4B) :: b
z_order = zval-1
i8 = 0
j8 = 0
k8 = 0
do b=0, 19
call mvbits(z_order,3*b+2,1,i8,b)
call mvbits(z_order,3*b+1,1,j8,b)
call mvbits(z_order,3*b ,1,k8,b)
end do
i = int(i8,kind=I4B) + 1
j = int(j8,kind=I4B) + 1
k = int(k8,kind=I4B) + 1
END SUBROUTINE z_to_pos
Note that I prefer the input and output ranges to start with 1 instead of 0, leading to some additional calculations.
As it turns out, this implementation is rather slow. I measured the time it takes to transform and re-transform 10^7 positions:
gfortran -O0: 6.2340 seconds
gfortran -O3: 5.1564 seconds
ifort -O0: 4.2058 seconds
ifort -O3: 0.9793 seconds
I also tried different optimization options for gfortran without success. While the optimized code with ifort is already a lot faster, it is still the bottleneck of my program. It would be really helpful if someone could point me into the right direction how to do the bit interleaving more efficiently in Fortran.