Basically I am trying to understand the code at: https://gcc.godbolt.org/z/7xxb3G
void __attribute__((noinline))
cond_unset_bit(uint64_t * v, uint32_t b) {
if(__builtin_expect(!!(*v & ((1UL) << b)), 1)) {
*v ^= ((1UL) << b);
}
}
compiles to:
cond_unset_bit(unsigned long*, unsigned int):
movq (%rdi), %rax
btq %rsi, %rax
jnc .L6
btcq %rsi, %rax
movq %rax, (%rdi)
.L6:
ret
Based on Agner Fog's Instruction Table (skylake is pg 238) btq
and btcq
have the exact same cost when operating on a register. btcq
will also set the carry flag to the previous bit so it seems the exact same logic (with better performance) could be accomplished w/o the btq
instruction i.e:
cond_unset_bit(unsigned long*, unsigned int):
movq (%rdi), %rax
btcq %rsi, %rax
jnc .L6
movq %rax, (%rdi)
.L6:
ret
What is the reason for including the btq
?
I am tuning for x86_64 / intel skylake chip
Edit: Thanks @Peter Cordes (and for the help on all my other posts :)