49

What exactly does this instruction do?

movzbl  0x01(%eax,%ecx), %eax
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • same for movsbl: http://stackoverflow.com/questions/7861095/what-does-movsbl-instruction-do – Ciro Santilli OurBigBook.com Apr 24 '15 at 21:24
  • 3
    @Waqar: I don't think your edit was an improvement, and seems too small to be worth bumping the question for. If you're going to add a space, I'd only add one between the two operands, not also inside the addressing mode. It's totally normal to write AT&T addressing modes without spaces since there are already commas and the allowed things are so rigid. So I'd have written `movzbl 1(%eax,%ecx), %eax` (that's the formatting GCC's asm output uses: https://godbolt.org/z/E4r9dP). This might have been literal compiler output or disassembly someone copy/pasted, with that spacing. – Peter Cordes Jul 22 '20 at 13:33
  • Alright, I will keep this in mind for future edits. – Waqar Jul 22 '20 at 14:23

2 Answers2

54

AT&T syntax splits the movzx Intel instruction mnemonic into different mnemonics for different source sizes (movzb vs. movzw). In Intel syntax, it's:

movzx eax, byte ptr [eax+ecx+1]

i.e. load a byte from memory at eax+ecx+1 and zero-extend to full register.

BTW, most GNU tools now have a switch or a config option to prefer Intel syntax. (Such as objdump -Mintel or gcc -S -masm=intel, although the latter affects the syntax used when compiling inline-asm). I would certainly recommend to look into it, if you don't do AT&T assembly for living. See also the tag wiki for more docs and guides.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Igor Skochinsky
  • 24,629
  • 2
  • 72
  • 109
  • Understood: this will load the byte from `%eax + %ecx + 1` address and expand it to long with a leading zeros. Thanks a lot! –  Feb 16 '12 at 19:51
  • 2
    Assuming I'm looking at a `movzbl` in 64-bit x86_64 code, would "zero-extend to full register" refer to %eax, or the full %rax? – Marius Gedminas Aug 27 '18 at 15:58
  • 4
    @MariusGedminas you can probably use rax explicitly with movzbq but any operation on a 32-bit register zeroes out the top 32 bits, so in effect movzbl to eax will extend to rax. – Igor Skochinsky Aug 27 '18 at 23:12
  • 1
    @IgorSkochinsky Thank you! (I was looking at a disassembly output of a function that somehow managed a 32-bit signed overflow while mixing 64-bit unsigned and double arithmetic. movzbl was not the culprit.) – Marius Gedminas Aug 28 '18 at 09:24
29

Minimal example

mov $0x01234567, %eax
mov $1, %bl
movzbl %bl, %eax
/* %eax == 0000 0001 */

mov $0x01234567, %eax
mov $-1, %bl
movzbl %bl, %eax
/* %eax == 0000 00FF */

Runanble GitHub upstream with assertions.

The mnemonic is:

  • MOV
  • Zero extend
  • Byte (8-bit)
  • to Long (32-bit)

There are also versions for other sizes:

  • movzbw: Byte (8-bit) to Word (16-bit)
  • movzwl: Word (16-bit) to Long (32-bit)

Like most GAS instructions, you can omit the last size character when dealing with registers:

movzb %bl, %eax

but I cannot understand why we can't omit the before last letter, e.g. the following fails:

movz %bl, %eax

Why not just deduce it from the size of the operands when they are registers as for mov and Intel syntax?

And if you use registers of the wrong size, it fails to compile e.g.:

movzb %ax, %eax
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • 1
    If you [look at the opcodes](http://www.felixcloutier.com/x86/MOVZX.html), the "operand size" controls the size of the destination but there are different opcodes for different sizes of source. So it makes sense for gas to treat `movzb` as a separate instruction from `movzw`. (For sign-extension, AMD even added a new Intel-syntax mnemonic for `movsxd r64, r/m32` instead of further overloading `movsx`. IDK why.) – Peter Cordes Jul 23 '16 at 03:04