6

Could you please explain how the following assembler code works?

xor ebx, ebx;
mov bl, byte ptr[ecx];
cmp ebx, 0;

I don't get it why you move byte to bl and afterwards you compare ebx and not bl.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • If `EBX` was initially zero, then a full register comparison may be faster than only comparing the lower 8 bits. This depends on the details of the hardware. Comparing only `BL` might require additional masking operations. – Kerrek SB Dec 20 '15 at 12:28
  • Or it's a consequence of the compiler doing a `ch == 0` -> implicitly `(int) ch == 0`. – Mats Petersson Dec 20 '15 at 12:30
  • 1
    @MatsPetersson: Maybe, if it's an unoptimized compilation? I also wonder why it's `cmp ebx, 0` and not the shorter `test ebx, ebx`. Maybe some other flags are used later? (Doesn't `cmp` affect flags differently from `test`?) – Kerrek SB Dec 20 '15 at 12:33
  • Yes i use flag later, `jz exit_loop;` I still don' get it why ther is `cmp ebx, 0;` And not `cmp bl, 0;` – Imantas Balandis Dec 20 '15 at 12:35
  • It does look like this is some kind of inline assembler sequence. Why it is that particular way could be any reason from "not very good at writing assembler" to "experiments on a wide range of systems with different processor architectures have found this to be the fastest". It's near impossible to say for sure. – Mats Petersson Dec 20 '15 at 12:42
  • 1
    @ImantasBalandis: No, the zero flag is set by both `test` and `cmp`. I forget the details, but there's some non-trivial difference between the two (maybe regarding overflow?). If the difference doesn't matter, then `test ebx, ebx` results in a shorter instruction, because it doesn't require an immediate value. – Kerrek SB Dec 20 '15 at 12:44
  • This is weird code. First of all why doesn't it use `movzx` to zero-extend, rather than this thing? From PPro through P3 at least this pattern doesn't cause a partial register stall, I don't know about the others but `movzx` is safer. And why do it in the first place? Ok maybe it uses that value later on in a context where a 32bit value is required, but using a `cmp bl, 0` here (or `test bl, bl`) isn't any slower. – harold Dec 20 '15 at 18:01
  • @harold: Yeah, a `movzx bl, byte ptr [ecx]` / `test ebx, ebx` would be better in every way. `mov bl, [mem]` / `movzx` / `test` would be worse, though ([false dep on AMD/P4/Silvermont](http://stackoverflow.com/q/33666617/224132). According to Agner Fog, the xor-first avoids partial reg stalls / extra uops on all P6/SnB family CPUs, and Haswell never has partial-reg stalls). All this assumes you actually want the value in a register for some other purpose; if not then `cmp byte ptr [ecx], 0` is best. – Peter Cordes Dec 20 '15 at 22:50

4 Answers4

7

bl is the name of the low 8 bits (bits 7-0) in the ebx register. There is also bh which is the bits 15-8 of ebx, and bx is the low 16 bits (bits 15-0). There is no name for the higher 16 bits.

This applies to all of the registers eax, ebx, ecx and edx.

Given that ebx is first zero'd, the resulting code is probably the consenquence of the compiler doing compiling something like:

char ch;
const char str;
int i;
...
ch = str[i];
if (ch == 0) ... 

[Or possibly just if (ch)].

The extension to 32-bits would be caused by either "saves space" or "runs faster", or the fact that if (ch == 0) has an int on the right-hand side and needs to compare the value as int rather than as char = byte - I can't say which without seeing the original source code - and even then, the actual code-generation in the compiler quite a complex set of decisions, based on both "what runs fast on which processor" and "correctness according to the language".

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
3

This instruction peforms an exclusive-or between all 32 bits of EBX and all 32 bits of EBX, leaving the result in EBX. You can prove easily that this is the same as moving the value of 0 into EBX (but it's faster than doing that because this way there are no memory fetches required)

xor ebx, ebx;

This instruction moves the BYTE (8 bits) at the address pointed to by ECX into the LOW 8 bits of EBX, leaving the other 24 bits unchanged (they're zero - remember?)

mov bl, byte ptr[ecx];

This instruction compares the whole 32-bit value in EBX with 0 - in this case it's logically the same as just comparing the byte in BL with 0 since we know the upper 24 bits will be 0

cmp ebx, 0;

(anticipated) why do it this way?

Because this is a 32-bit processor. It's geared to operate on 32-bit values much more efficiently than 8-bit ones. The compiler knows this and will always seek to promote smaller values to larger ones as soon as it is allowed.

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 1
    `cmp bl, 0` is just as fast as `cmp ebx, 0` (on all µarchs I know about), this pattern hold for pretty much all 8bit operations. Slowness comes from partial register stalls/recombination, not from using 8bit operations in the first place. – harold Dec 20 '15 at 17:54
0

BL is the low byte of the EBX register so, since you're xoring EBX before, you're comparing 0 against EBX with its low byte equal to BL.

Marco A.
  • 43,032
  • 26
  • 132
  • 246
0

From a pure end effect perspective, there is no difference between

CMP ebx,0

and

CMP bl,0

in the given situation as there is already an XOR EBX,EBX preceding this group of instructions. The first command is executed after sign extending the imm8 value, so effective comparison, which is executed by executing a subtraction in the temporary registers) is always 32 bit in this case. The opcode lengths are also identical : 80 /7 /ib and 83 /7 /id.

However, from a data bus flow perspective, the data bus is optimized to read a 32 bit data flow faster as it does not have to execute an AND with internal mask to isolate the 8 bits data content of the register, and therefore, it's a recommended practice for assembler writers to use the best possible bus width - which is usually a multiple of 32. Therefore, you would see cmp ebx,0 as the preferred code translation over cmp bl,0.

Have fun programming...

quasar66
  • 555
  • 4
  • 14
  • It's not faster, see Harold's comment on Richard Hodges similar answer. Or see http://agner.org/optimize/. If this was optimized code, it would have used `test ebx, ebx`, or `test bl, bl` and left out the `xor`. Or `cmp byte ptr[ecx], 0`, if the value wasn't needed again later. – Peter Cordes Dec 20 '15 at 22:39