What is the implementation of GCC's (4.6+) __builtin_clz
? Does it correspond to some CPU instruction on Intel x86_64 (AVX)
?

- 23,584
- 43
- 124
- 195
-
1I don't know, but if it's available, `LZCNT` seems like a likely candidate. (See http://en.wikipedia.org/wiki/SSE4) – reuben Feb 19 '12 at 22:40
2 Answers
Yes, and no.
CLZ (count leading zero) and BSR (bit-scan reverse) are related but different. CLZ equals (type bit width less one) - BSR. CTZ (count trailing zero), also know as FFS (find first set) is the same as BSF (bit-scan forward.)
Note that all of these are undefined when operating on zero!
In answer to your question, most of the time on x86 and x86_64, __builtin_clz generates BSR operation subtracted from 31 (or whatever your type width is), and __builting_ctz generates a BSF operation.
If you want to know what assembler GCC is generating, the best way to know is to see. The -S flag will have gcc output the assembler it generated for the given input:
gcc -S -o test.S test.c
Consider:
unsigned int clz(unsigned int num) {
return __builtin_clz(num);
}
unsigned int ctz(unsigned int num) {
return __builtin_ctz(num);
}
On x86 for clz gcc (-O2) generates:
bsrl %edi, %eax
xorl $31, %eax
ret
and for ctz:
bsfl %edi, %eax
ret
Note that if you really want bsr, and not clz, you need to do 31 - clz (for 32-bit integers.) This explains the XOR 31, as x XOR 31 == 31 - x (this identity is only true for numbers of the from 2^y - 1) So:
num = __builtin_clz(num) ^ 31;
yields
bsrl %edi, %eax
ret

- 2,301
- 1
- 16
- 4
It should translate to a Bit Scan Reverse instruction and a subtract. The BSR gives the index of the leading 1, and then you can subtract that from the word size to get the number of leading zeros.
Edit: if your CPU supports LZCNT (Leading Zero Count), then that will probably do the trick too, but not all x86-64 chips have that instruction.

- 789
- 5
- 8
-
-
-
If you compiler with `-mbmi1`, you're telling gcc that `lzcnt` is available on all CPUs that will run this binary. (or much better `-march=haswell` or `-march=bdver2` or `-march=znver1` or whatever, to tune for a specific CPU as well as enabling instruction sets.) – Peter Cordes Jul 07 '18 at 05:34
-
-
https://godbolt.org/z/5re8vWzq8 yep, depending on instruction set both are being generated – sehe Oct 02 '21 at 13:08