What is the semantics of testb $1, %al

Question

I'm trying to understand what this testb instruction (x86-64) will do.

testb $1, %al

What is the value of $1 here. is it all ones (0xFF) or a single 1 (0x1)?

The assembly is produced by clang for the following program:

#include <atomic>
std::atomic<bool> flag_atomic{false};

extern void f1();
extern void f2();

void foo() {
    bool b = flag_atomic.load(std::memory_order_relaxed);
    if (b == false) {
        f1();
    } else {
        f2();
    }   
}

The relevant assembly with (clang++ -s test.cpp -O3) is the following:

Lcfi2:
  .cfi_def_cfa_register %rbp
  movb  _flag_atomic(%rip), %al 
  testb $1, %al ; <<<<------------
  jne LBB0_2

`b` operand-size is *byte*, not *bit*, so `$1` is a byte with only the low bit set. Weird that clang uses a separate load instead of `test` with a memory operand; the function doesn't use the value again so no need to have it in a register. Or actually, an [immediate + RIP-relative wouldn't micro-fuse](https://stackoverflow.com/questions/26046634/micro-fusion-and-addressing-modes) on Intel. It might also fail to macro-fuse, so you could end up with 3 fused-domain uops on Intel from `test $1, _flag_atomic(%rip) / jnz`. — Peter Cordes, May 02 '18 at 13:56
Note that `movb` is already a micro-fused load + merge into the low byte of RAX, though, [on recent Intel](https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to). `movzbl` would avoid the false dependency. So clang already has 3 unfused-domain uops (2 of them from the `movb`), but only 2 total fused-domain uops this way. — Peter Cordes, May 02 '18 at 13:57
Yes, for some reason, this only happens with atomic_load, for a scalar load clang generates `cmpb` without using `%al`. — A. K., May 02 '18 at 14:16
Ah, yeah `atomic` gets treated specially inside the compiler, and (like `volatile`) sometimes ends gimping the optimizer. Current compilers don't even optimize away repeated atomic loads, although the current standard does allow that: [Can and does the compiler optimize out two atomic loads?](//stackoverflow.com/q/41820539) and especially my answer on [Why don't compilers merge redundant std::atomic writes?](//stackoverflow.com/q/45960387) — Peter Cordes, May 02 '18 at 14:22
FYI, http://godbolt.org/ is really nice for playing with compiler asm output (e.g. your code: https://godbolt.org/g/oNarvk) with different options / compilers / versions. — Peter Cordes, May 02 '18 at 14:24
Thanks for sharing the resources. Very useful! I regularly use godbolt btw. PS: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85610 — A. K., May 02 '18 at 14:32

Matteo Italia · Accepted Answer · 2018-05-03T06:51:44.867

In AT&T syntax $ is the prefix for immediate values (see also); $1 is a plain 1, so your instruction sets the flags according to the least significant bit of al.

is it all ones (0xFF) or a single 1 (0x1)?

All ones would be

testb $-1, %al

or (exact same machine code, just disassembly preference)

testb $0xff, %al

which incidentally would have the exact same semantic as

testb %al, %al

(as a mask of 0xff over an 8 bit register doesn't mask anything), and in this case is also valid for your code, as for a boolean there should be no need to mask anything out to check if it's true (and indeed gcc prefers this last version for your code).

movb  _flag_atomic(%rip), %al 
testb $1, %al
jne LBB0_2

in Intel syntax (no prefixes, no suffixes, dest, source operands order, explicit memory addressing syntax) this is

mov al, [rip+_flag_atomic]
test al, 1
jne LBB0_2

And, in pseudo-C:

%al = _flag_atomic;
if(%al & 1 != 0) goto LBB0_2;

(jne is an alias of jnz, which is probably more clear in this case).

score 0 · Answer 2 · answered May 02 '18 at 13:06

0

0x is prefixed with hexadecimal number. 0 is prefixed with Octal number and if you do not mentioned any prefix it would be decimal number system. In your case this is 1 in decimal number system.

answered May 02 '18 at 13:06

Sandeep

333
2
7

What is the semantics of testb $1, %al

2 Answers2