What is the fastest way to find the lowest 0 bit in a bitmask?

Question

Assume we have a bitmask, say 64 bits. A simple way to find the lowest 0 bit, would be to do a logical bit & operation with the mask 0x0001 and shift the above mask to the left whenever the result is 1.

This is a linear time complexity in the number of bits, O(n). I wonder if you can achieve a faster execution than the naive iterative approach or if this is the theoretical optimum.

invert it and use a count-trailing-zeros instruction. e.g. `__builtin_ctz(~x)` in GNU C. Most architectures have a ctz instruction, but ISO C doesn't portably expose it. — Peter Cordes, May 01 '18 at 06:55
Is there anythign special about the data? Is it e.g. likely that the lowest 0 will be MSB or more likely that it will be LSB? In case you do not want to take Peters advice and/or have to implement yourself think of comparing larger parts (8bit, 16bit, 32bit) to all-ones, that should get you O(ln n). — Yunnosch, May 01 '18 at 06:59
I found some Q&As with answers for efficient CTZ on various compilers. This is a duplicate once you invert the bitmap. There's also POSIX `ffs`, which good C++ implementations will inline to a CTZ instruction. Some of the answers on the linked dups avoid intrinsics, so you can use them instead, as @Yunnosch suggests, with the choice maybe depending on whether it's likely that there's a zero bit in the low 8 or something. On x86, `not` / `tzcnt` or `bsf` is definitely your best bet, though, with intrinsics. — Peter Cordes, May 01 '18 at 07:04
@PeterCordes I have a tiny issue. I am guaranteed to have 0 as a potential input as well as 64 bits, and the documentation says this leads to undefined behaviour — Makogan, May 01 '18 at 07:11
You mean `0` after inverting, so your actual input was all-ones? In that case you'll want to use `ffsll`, or on x86-64, [`_tzcnt_u64`](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=411,3941,3941,433,1508,408,5534&techs=AVX,AVX2,Other&text=tzcnt) which returns 64 for an input of all-zeros. (requires BMI1, unlike `bsf` which is baseline). Do any of the linked duplicates mention this? I didn't look carefully, maybe they need a new answer. — Peter Cordes, May 01 '18 at 07:17
So I should have an if statement and check for the 64 bit case separatly and then attempt a ctz if that's not the case. Correct? I appologize I am not an embedded programmer, I am a graphics programmer that's currently finding stupid ways to optimize code — Makogan, May 01 '18 at 07:20
So yes you mean the `~x` that you want to bit-scan can be zero? 1. what hardware do you care about your code running fast on? 2. how portable do you need the source to be, and are you interested in having a `#ifdef` for x86-64 with `tzcnt` (Haswell / Piledriver)? 3. Can you use POSIX `ffsll`, and does it optimize well on the compilers you care about? (it has defined behaviour for all inputs, unlike `__builtin_ctz`) 4. If you can't do any of those, then GNU C++ `if(! ~x) { return 64; } return __builtin_ctzll(~x);` is probably your best bet. Or maybe use a ternary operator for cmov? — Peter Cordes, May 01 '18 at 09:00
If you want to make your question specific to what you're optimizing for, rather than generic, it might be possible to answer it. As is, it's definitely a duplicate because there are no hardware / target details in the question. You tagged assembly, but not what architecture. I still don't even know if you're on x86, PowerPC, MIPS64, AArch64, or what. — Peter Cordes, May 01 '18 at 11:48
@PeterCordes I am trying to target as many architectures as possible actually. — Makogan, May 01 '18 at 19:07

What is the fastest way to find the lowest 0 bit in a bitmask?

0 Answers0