What does mov bl do in assembly

Question

Could you please explain how the following assembler code works?

xor ebx, ebx;
mov bl, byte ptr[ecx];
cmp ebx, 0;

I don't get it why you move byte to bl and afterwards you compare ebx and not bl.

If `EBX` was initially zero, then a full register comparison may be faster than only comparing the lower 8 bits. This depends on the details of the hardware. Comparing only `BL` might require additional masking operations. — Kerrek SB, Dec 20 '15 at 12:28
Or it's a consequence of the compiler doing a `ch == 0` -> implicitly `(int) ch == 0`. — Mats Petersson, Dec 20 '15 at 12:30
@MatsPetersson: Maybe, if it's an unoptimized compilation? I also wonder why it's `cmp ebx, 0` and not the shorter `test ebx, ebx`. Maybe some other flags are used later? (Doesn't `cmp` affect flags differently from `test`?) — Kerrek SB, Dec 20 '15 at 12:33
Yes i use flag later, `jz exit_loop;` I still don' get it why ther is `cmp ebx, 0;` And not `cmp bl, 0;` — Imantas Balandis, Dec 20 '15 at 12:35
It does look like this is some kind of inline assembler sequence. Why it is that particular way could be any reason from "not very good at writing assembler" to "experiments on a wide range of systems with different processor architectures have found this to be the fastest". It's near impossible to say for sure. — Mats Petersson, Dec 20 '15 at 12:42
@ImantasBalandis: No, the zero flag is set by both `test` and `cmp`. I forget the details, but there's some non-trivial difference between the two (maybe regarding overflow?). If the difference doesn't matter, then `test ebx, ebx` results in a shorter instruction, because it doesn't require an immediate value. — Kerrek SB, Dec 20 '15 at 12:44
This is weird code. First of all why doesn't it use `movzx` to zero-extend, rather than this thing? From PPro through P3 at least this pattern doesn't cause a partial register stall, I don't know about the others but `movzx` is safer. And why do it in the first place? Ok maybe it uses that value later on in a context where a 32bit value is required, but using a `cmp bl, 0` here (or `test bl, bl`) isn't any slower. — harold, Dec 20 '15 at 18:01
@harold: Yeah, a `movzx bl, byte ptr [ecx]` / `test ebx, ebx` would be better in every way. `mov bl, [mem]` / `movzx` / `test` would be worse, though ([false dep on AMD/P4/Silvermont](http://stackoverflow.com/q/33666617/224132). According to Agner Fog, the xor-first avoids partial reg stalls / extra uops on all P6/SnB family CPUs, and Haswell never has partial-reg stalls). All this assumes you actually want the value in a register for some other purpose; if not then `cmp byte ptr [ecx], 0` is best. — Peter Cordes, Dec 20 '15 at 22:50

Mats Petersson · Accepted Answer · 2015-12-20T12:35:40.937

7

bl is the name of the low 8 bits (bits 7-0) in the ebx register. There is also bh which is the bits 15-8 of ebx, and bx is the low 16 bits (bits 15-0). There is no name for the higher 16 bits.

This applies to all of the registers eax, ebx, ecx and edx.

Given that ebx is first zero'd, the resulting code is probably the consenquence of the compiler doing compiling something like:

char ch;
const char str;
int i;
...
ch = str[i];
if (ch == 0) ...

[Or possibly just if (ch)].

The extension to 32-bits would be caused by either "saves space" or "runs faster", or the fact that if (ch == 0) has an int on the right-hand side and needs to compare the value as int rather than as char = byte - I can't say which without seeing the original source code - and even then, the actual code-generation in the compiler quite a complex set of decisions, based on both "what runs fast on which processor" and "correctness according to the language".

edited Dec 20 '15 at 12:35

answered Dec 20 '15 at 12:27

Mats Petersson

126,704
14
140
227

Ohh, so that means after i do bl, it only compares my ebx low 8bits? – Imantas Balandis Dec 20 '15 at 12:31
1

@ImantasBalandis: No, you're comparing all of `EBX`, but you know what the other 24 bits are since you set them earlier with the `xor`. – Kerrek SB Dec 20 '15 at 12:32
2

@Imantas Yes, if you used `cmp bl, 0` you would get the same effect but only compared the lowest 8 bits – Sami Kuhmonen Dec 20 '15 at 12:36
Thanks that really helped me! – Imantas Balandis Dec 20 '15 at 12:40

score 3 · Answer 2 · answered Dec 20 '15 at 12:40

This instruction peforms an exclusive-or between all 32 bits of EBX and all 32 bits of EBX, leaving the result in EBX. You can prove easily that this is the same as moving the value of 0 into EBX (but it's faster than doing that because this way there are no memory fetches required)

xor ebx, ebx;

This instruction moves the BYTE (8 bits) at the address pointed to by ECX into the LOW 8 bits of EBX, leaving the other 24 bits unchanged (they're zero - remember?)

mov bl, byte ptr[ecx];

This instruction compares the whole 32-bit value in EBX with 0 - in this case it's logically the same as just comparing the byte in BL with 0 since we know the upper 24 bits will be 0

cmp ebx, 0;

(anticipated) why do it this way?

Because this is a 32-bit processor. It's geared to operate on 32-bit values much more efficiently than 8-bit ones. The compiler knows this and will always seek to promote smaller values to larger ones as soon as it is allowed.

`cmp bl, 0` is just as fast as `cmp ebx, 0` (on all µarchs I know about), this pattern hold for pretty much all 8bit operations. Slowness comes from partial register stalls/recombination, not from using 8bit operations in the first place. — harold, Dec 20 '15 at 17:54

Marco A. · Answer 3 · 2015-12-20T12:33:53.647

0

BL is the low byte of the EBX register so, since you're xoring EBX before, you're comparing 0 against EBX with its low byte equal to BL.

edited Dec 20 '15 at 12:33

answered Dec 20 '15 at 12:28

Marco A.

43,032
26
132
246

score 0 · Answer 4 · answered Dec 20 '15 at 17:32

From a pure end effect perspective, there is no difference between

CMP ebx,0

and

CMP bl,0

in the given situation as there is already an XOR EBX,EBX preceding this group of instructions. The first command is executed after sign extending the imm8 value, so effective comparison, which is executed by executing a subtraction in the temporary registers) is always 32 bit in this case. The opcode lengths are also identical : 80 /7 /ib and 83 /7 /id.

However, from a data bus flow perspective, the data bus is optimized to read a 32 bit data flow faster as it does not have to execute an AND with internal mask to isolate the 8 bits data content of the register, and therefore, it's a recommended practice for assembler writers to use the best possible bus width - which is usually a multiple of 32. Therefore, you would see cmp ebx,0 as the preferred code translation over cmp bl,0.

Have fun programming...

It's not faster, see Harold's comment on Richard Hodges similar answer. Or see http://agner.org/optimize/. If this was optimized code, it would have used `test ebx, ebx`, or `test bl, bl` and left out the `xor`. Or `cmp byte ptr[ecx], 0`, if the value wasn't needed again later. — Peter Cordes, Dec 20 '15 at 22:39

What does mov bl do in assembly

4 Answers4