0

Sample code that moves 16 even bits to top half and 16 odd bits to lower half of a uint32_t number:

uint32_t separateBits(uint32_t x)
{
    uint32_t even = 0, odd = 0;
    for (int i=0; i<32; i+=2)
    {
        even |= (x & (1 << (i+0))) >> (0 + i/2);
        odd  |= (x & (1 << (i+1))) >> (1 + i/2);
    }
    return (even << 16) | odd;
}

Are there any more off efficient approaches, or any bit hacks that could be used to efficiently perform the same operation? This is mainly for mobiles, so if it's ok to use arm specific instructions.

It looks like clang already unrolls loops when compiling to arm, as generated code doesn't look anywhere like actual c-code.

For example, on intel this can be done this way:

uint32_t separateBits(uint32_t x)
{
    uint32_t even = _pext_u32(x, 0x55555555);
    uint32_t odd  = _pext_u32(x, 0xaaaaaaaa);
    return (even << 16) | odd;
}
Pavel P
  • 15,789
  • 11
  • 79
  • 128
  • 2
    Already looks pretty optimized to me. Awesome unrolling, cool magic constants. vector instructions and all the good stuff. https://godbolt.org/g/rqGp7S https://godbolt.org/g/ruPftA How much speedup over that do you need? Or do you just want to make your code look more fancy? – Baum mit Augen Aug 05 '18 at 20:21
  • @BaummitAugen I don't have any targets, just curious if there are any other approaches. – Pavel P Aug 05 '18 at 20:22
  • @Pavel Then you should state your goals more clearly I would say. Your use of "efficient" suggests you are looking for speedup of some sort. – Baum mit Augen Aug 05 '18 at 20:23
  • @harold I'll review it, but it cannot be a duplicate, there is no way to connect one to another. E.g. a person who wants to split bits doesn't even need to know about morton codes to solve it. – Pavel P Aug 05 '18 at 20:23
  • @Pavel but 2D Morton decoding and splitting odd/even bits are literally the same thing – harold Aug 05 '18 at 20:24
  • This can definitely be done in `O(ln N)` instead of `O(N)` where N is the number of bits. – Ben Voigt Aug 05 '18 at 20:25
  • @BenVoigt Since the bits are transported different distances I fail to see how it can be done in less than linear time. Except by ditching that requirement, in which case it boils down to simple masking, which is O(1). – Cheers and hth. - Alf Aug 05 '18 at 20:27
  • @BaummitAugen I guess those who've know (provided there are ways) would be able to show alternatives that are superior, and do not even need loops. – Pavel P Aug 05 '18 at 20:27
  • @Pavel The compiled result of your code doesn't have loops, either. – Baum mit Augen Aug 05 '18 at 20:29
  • @Pavel: Depending on how smart or not the compiler is, you might get a little speedup by computing the `1 << i` incrementally, just shifting a single bit left two times for each iteration. But do **measure**. And maybe inspect the machine code. – Cheers and hth. - Alf Aug 05 '18 at 20:29
  • @BaummitAugen I didn't know what moton codes were :) and yes, arm version as well totally doesn't look like the c-code (I mainly care for arm version, don't care if it's fast on intels). – Pavel P Aug 05 '18 at 20:30
  • @Alf: But if you have a (move-by-1) step and a (move-by-2) step, you get the move-by-3 for free, by selecting it in both steps. Adding a (move-by-4) step gives by-5, by-6, and by-7 for free. – Ben Voigt Aug 05 '18 at 20:30
  • @Pavel: Your compiler output is just unrolled loops, no bit hacks. – Ben Voigt Aug 05 '18 at 20:31
  • @harold that link suggested by you looks like what I was looking for. Thanks, however, it still doesn't seem to be duplicate, looks like it does the opposite: interleaving bits – Pavel P Aug 05 '18 at 20:35
  • 1
    @Pavel it's both in there, though the top answer concentrates on Intel specific implementations. That `morton_1` function from the second answer does half of the separation job (collecting just even bits) – harold Aug 05 '18 at 20:39
  • @harold yes, after reading it thoughly the function that I need is `morton_to_xy` – Pavel P Aug 05 '18 at 20:41
  • how does this compare to a simple look up table of each byte or halfword? – old_timer Aug 06 '18 at 06:09

0 Answers0