0

I am using the bsf x86-64 instruction found on page 210 of Intels developers manual found here. Essentially, if a least significant 1 bit is found, its bit index is stored in the destination operand .

Furthermore, the ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared.

I am compiling my C code with inline x86-64 assembly instructions. I have defined a C function which invokes the bsf instruction:

uint64_t bitScanForward(T_bitboard b) {
    __asm__(
       "bsf %rcx,%rax\n"
       "leave\n"
       "ret\n"
    );
}

and also another C function which checks if the status of the ZF bit in the flag register:

uint64_t isZFSet() {
    printf("\n"); <- This is another problem I am having (see below)...
    __asm__(
        "jz true\n"
        "movq $0,%rax\n"//return false
        "jmp end\n"
        "true:\n"
        "movq $1,%rax\n"//return true
        "end:\n"
        "leave\n"
        "ret\n"
    );
}

I have tested these and found that the ZF flag is always cleared even when the bsf comand is applied to the number zero, seemingly going against the specification.

//Calling function...
//Do stuff...
bitScanForward(0ULL);//ULL is 64 bit on my machine
if(isZFSet()){//ZF flag *should* be set here but its not
   printf("ZF flag is set\n");
}
//More stuff...

I suspect the reason the ZF flag is clearing is due to entering and leaving one set of inline instructions to another.

How can I ensure that the flag in the above code is set as specified in the documentation? (I don't want to change much of my code or design)

My "other problem" is that if I dont include the printf statement in the isZFFlagSet, the function seemingly doesnt execute. Totally bizarre. Can anyone explain why?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dean P
  • 1,841
  • 23
  • 23
  • 1
    Your asm is broken to start with. You can't use `leave; ret` and expect it to work. Compiler may insert other code between your inline blocks, and even if it doesn't your very own `printf` will certainly mess up the flags. Since your `isZFSet` is not returning anything as far as the compiler can tell, it gets optimized out (is probably undefined behavior so all bets are off). – Jester Jan 29 '21 at 23:03
  • 2
    condition codes aren't preserved across inline asm statements. If you want ZF as an output, you need to tell the compiler about it (e.g. with GCC6 flag outputs to get it as a `bool`: [Using condition flags as GNU C inline asm outputs](https://stackoverflow.com/a/50012078)). Also, you can't safely `leave` or `ret` from an asm statement inside a (non-`naked`) function!! That might happen to work in debug mode, but will *completely* break when `-O2` enables `-fomit-frame-pointer`, and when these inline into their caller. – Peter Cordes Jan 29 '21 at 23:04
  • Related: [propagate carry bit between two ASM GCC inline block](https://stackoverflow.com/q/48810179) . Also, near duplicate of [GCC doesn't push registers around my inline asm function call even though I have clobbers](https://stackoverflow.com/q/57687660) except with ZF instead of ECX: the compiler uses all the registers itself, and you can't make it leave them alone between statements. Also, FLAGS are call-clobbered in the standard calling convention, so there's zero hope that printf won't run something like `xor eax,eax` and clobber ZF. – Peter Cordes Jan 29 '21 at 23:16
  • 2
    Note that gcc is perfectly capable of emitting a `bsf` instruction itself if you use `builtin_clz()`. There is no need to descend into inline assembly, especially if you have no clue what you are doing. gcc-style inline assembly is hard. Do not use it unless you know exactly what you are doing. – fuz Jan 29 '21 at 23:30
  • Correction: `__builtin_ctz` to count "trailing" zeros (starting with LSB, i.e. "forwards" like bsf). Or for 64-bit operand-size `__builtin_ctzll`. But yes, the first and most important link on https://stackoverflow.com/tags/inline-assembly/info is https://gcc.gnu.org/wiki/DontUseInlineAsm – Peter Cordes Jan 29 '21 at 23:49
  • @PeterCordes Indeed. I frequently confuse `bsf` and `bsr`, but a quick look at the manual clears it right up. Incidentally, these two aren't even the optimal instructions for the task these days, so OP's code is worse code than possible with builtin functions. – fuz Jan 29 '21 at 23:50
  • 1
    @fuz: Yup, `_tzcnt_u64(b) & 63` should do the trick for the OP if they have BMI1 available, producing 0 for a result of 64 (b==0 input). Or `b ? __builtin_ctzll(b) : 0;` portably, which probably won't compile to use the FLAGS result of BSF, whether or not that's a good thing. And yeah, remembering that BSF forwards = low to high is tricky; I sometimes remember by remembering that it's what you want for `memchr` or similar searches on the bitmap from pcmpeqb, for forward in that sense. Or just that forward = ascending bit order. – Peter Cordes Jan 29 '21 at 23:52

1 Answers1

9

You are treating an aggressively optimizing C compiler as if it were a macro assembler. That just plain isn't going to work. To get GCC to emit correct code in the presence of assembly inserts, you have to annotate the inserts with complete information about the registers and memory regions that are affected by the assembly code, and you have to use ancillary C statements to mesh them with the surrounding code. Even then, there are things the assembly insert cannot do at all. I urge you to scrap this entire mess and instead use the __builtin_ctzll intrinsic, as suggested in the comments on the question.

Now, to specifics. Your first function is incorrect because GCC does not support use of leave or ret inside an assembly insert. (More generally, assembly inserts may not alter the stack pointer, and may only jump to designated labels within the same function.) The correct way to use bsf from a GCC-style assembly insert is with "extended asm" with input and output operands:

uint64_t bitScanForward(uint64_t b) {
    uint64_t ret;
    asm ("bsf %1, %0" : "=r" (ret) : "r" (b));
    return ret;
}

You must declare a C variable to receive the output of the operation, and explicitly return that variable; having bsf write to %rax would not work (unlike how it was in old MSVC). BSF accepts any two registers as operands, so there is no need to use constraints more specific than r.

Your second function is incorrect because you didn't tell GCC that the condition codes were meaningful after bitScanForward, and because GCC does not support using the condition-code register as an input to an assembly insert. In order to read the ZF output from bsf you must do so within the same assembly insert that invoked bsf:

uint64_t countTrailingZeroes(uint64_t b) {
    uint64_t ret;
    asm ("bsf %1, %0\n\t"
         "cmove %2, %0"  
         : "=&r" (ret) 
         : "r" (b), "rm" (64));
    return ret;
}

This requires special care -- see how the constraint on operand 0 is now =&r instead of just =r? Without that, GCC is liable to think it can put operand 2 in the same register as operand 0.

Alternatively, you can specify that ZF is an output, which is supported (see the "flag output operands" section of the manual) and then supply a default value from C:

uint64_t countTrailingZeroes(uint64_t b) {
    uint64_t ret;
    int zf;
    asm ("bsf %2, %0"  
         : "=r" (ret), "=@ccz" (zf) : "r" (b));
    if (zf) ret = 64;
    return ret;
}
zwol
  • 135,547
  • 38
  • 252
  • 361
  • The only reason to consider using inline-asm for `bsf` is to take advantage of the AMD-documented behaviour (which Intel also currently implements but documents as "undefined value"): on input=0, destination register left unmodified. That's why BSF (and lzcnt/tzcnt/popcnt in some Intel CPUs) have a false dependency on the output register. So `ret=64;` / `asm("bsf %1, %0" : "+r"(ret) : "r"(b));`. (b could be `"rm"` but clang will anti-optimize that case and pick memory.) – Peter Cordes Jan 30 '21 at 02:16