I was trying to write a small wrapper function for the BTS instruction. So instead of the obvious:
bool bts( volatile uint32_t* dst, int idx ) {
uint32_t mask = 1 << idx;
bool ret = !!(dst & mask);
dst |= mask;
return ret;
}
I wrote this:
bool bts( volatile uint32_t* dst, int idx ) {
bool ret;
asm( "xor %1, %1\n\t"
"bts %2, %0\n\t"
"adc %1, %1"
: "+m"(*dst), "=r"(ret) : "Ir"(idx) : "cc","memory" );
return ret;
}
And this behaves ok when building optimized code, but when building non-optimized, it takes idx to always be 0. From the generated asm, it looks like it's not taking it from the rdx register but from the stack!
What am I doing wrong?