Why does disassembled code have different instruction sizes? How does the CPU know how many bytes to load?

Question

I've compiled the following code (GCC 9.1, x64)

0:  mov eax,0x3
5:  mov ebx,0x4
a:  add eax,ebx
c:  sub eax,ebx

And I've got the following binary code (hex view): B803000000BB0400000001D829D8

Disassembly says that I have the following accordance:

0:  b8 03 00 00 00          mov    eax,0x3
5:  bb 04 00 00 00          mov    ebx,0x4
a:  01 d8                   add    eax,ebx
c:  29 d8                   sub    eax,ebx

My questions are:

How does CPU understand how many bytes he should load to the register: 5 bytes (as at lines 0 and 1) or 2 bytes (as ay lines a and c)?

According to the hex representation CPU uses big endian to store numbers 3 and 4. Am I correct?

1) that's not the bytes loaded into the register. That's the size of the instruction itself. 2) no, that's little endian. — Jester, Mar 20 '20 at 13:19
@Jester Could you please provide more details about first answer? I did not understand what do you mean about "size of the instruction". — No Name QA, Mar 20 '20 at 13:37
The computer knows from the opcode part of the instruction how many bytes comprise the whole instruction (and so does the disassembler). — Weather Vane, Mar 20 '20 at 13:48
please read the documentation for the instruction set it is quite clear. look up the mov, add and sub instructions. find the instruction encoding. for example the opcode 0x01 means do a 16 bit or a 32 bit add, the next byte indicates the registers, iif you were to add the instruction say add eax,ecx you will see the 0x01 but something other than 0xD8 — old_timer, Mar 20 '20 at 13:49
@WeatherVane got it, but if we have ``b8`` as first byte, does it mean that ``b`` means ``MOV`` and ``8`` means `EAX`? — No Name QA, Mar 20 '20 at 13:50
@old_timer I can not find the prove one. Could you provide any link? — No Name QA, Mar 20 '20 at 13:51
As old_timer wrote, a good reference book will show you how the instruction works. Some contain bit-fields to specifiy registers, addressing modes etc. — Weather Vane, Mar 20 '20 at 13:51
A disassembler for a variable length instruction set can only do so much you have to disassemble in execution order to have half a chance and depending on the code there are sections you cant disassemble, and gnu itself struggles with file formats like elf. when executed though of course it can all be read in execution order and interpreted properly (assuming a properly implemented program). — old_timer, Mar 20 '20 at 13:52
well whenever you want to learn a new instruction set the first place you look as the ip or chip vendor, in this case that is intel or amd. But also in this case you simply google x86 instruction set and there will be more web pages and information than you have time to read. note x86 is a bad first instruction set to learn... — old_timer, Mar 20 '20 at 13:53
The encoding is not as simple as that. Consult the _Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2: Instruction Set Reference, A-Z_, chapter 2 "Instruction format" — Jester, Mar 20 '20 at 13:55
I edited your title from "WORD" to "instruction" because I think that's what you were trying to ask, making it a more direct duplicate of several existing Q&As. x86 is not a word-oriented ISA; instruction widths are not tied to words. x86 machine code is a byte stream with some little-endian multi-byte integers. In x86 terminology a "word" is 2 bytes (because x86 evolved out of 16-bit 8086). Often you work with dword or qword data (32 and 64-bit), but byte and word operand-sizes are also available for almost all integer instructions. None of this relates to instruction fetch/decode. — Peter Cordes, Mar 21 '20 at 00:13

Why does disassembled code have different instruction sizes? How does the CPU know how many bytes to load?

0 Answers0