Give the opcode and number of bytes of code for each of these instructions

Question

I am currently taking a low-level programming class and unfortunately, I cannot find this information in our text book (most of the questions are not contained in our text). I am having a hard time determining how to solve these. There is no information as to "CPU or anything along those lines". The question is as follows:

Give the op code and number of bytes of code for each of these instructions.
(a) mov exc, 984 Op Code_____ Number of Bytes_____
(b) xchg eax, ecx Op Code_____ Number of Bytes_____

There are a lot more, but if anyone could help me understand how to do these two, I hope to be able to translate that to the other questions.

One thing I tried, but not sure what I am seeing: I created a small *.asm project in Visual Studio 2019 and added that one line of code in (a) and looked at the listing file. I see it shows the following:

00000000  B9 000003D8         mov  ecx, 984

I am not sure if the information is to be found here and I am just missing it, or there is some other way to figure this out.

Refer to an instruction set reference for the encoding of x86 machine instructions. Visual Studio gives you a disassembly, but it seems to show immediates as 32 bit numbers which may be a bit confused (i.e. you'd usually give the encoding of `mov ecx, 984` as `b9 d8 03 00 00` instead of `b9 000003d8`). But the general approach is sound. Assembling the instructions in question and then asking a disassembler for the encoding or looking at the listing is a good approach. — fuz, Jul 28 '21 at 13:37
That makes sense. Thank you. I appreciate the help and information :-) — , Jul 28 '21 at 18:13
@fuz: Is [tag:instruction-encoding] a new tag? I've always just used [tag:machine-code] for questions like this; I'm not sure we need a separate tag for questions about how to encode instructions, although that is maybe a different kind of question from other machine-code questions that aren't about that basic thing. — Peter Cordes, Jul 29 '21 at 00:22
@PeterCordes It's not a tag I made, I just found it searching for something that fits. — fuz, Jul 29 '21 at 07:14
@fuz: Thoughts on making it a duplicate of [machine-code] (and [machine-language] if we eventually get it added as a synonym of machine-code)? I think probably yes. — Peter Cordes, Jul 29 '21 at 09:02
@PeterCordes Not sure. I think [instruction-encoding] is a lot more specific than [machine-code] and checking the question list for the latter, people seem to use it a lot just for questions about shuffling around text sections of binaries. — fuz, Jul 29 '21 at 10:06
@fuz: Ah ok, if there are other major uses for [machine-code], other than how specific instructions are encoded / how machine code works, then yeah there's a use for having a separate tag. We can add it or replace it on some of those [machine-code] Q&As. — Peter Cordes, Jul 29 '21 at 10:09

score 4 · Accepted Answer · edited Jul 29 '21 at 00:18

Whenever you have questions about the encoding of instructions, check instruction references, like AMD's or Intel's Manuals. Specifically, Volume 2 of Intel's manual applies here. A web-browsable version like https://www.felixcloutier.com/x86/ is scraped from that PDF.

The full PDFs have intro chapters that explain the notation used in entries for individual instruction. Related Q&As about that:

How to read the Intel Opcode notation
What does the /4 mean in FF /4? (Not relevant to the encoding of either of the specific instructions you asked about, but other instructions do the /r field as extra opcode bits).

For xchg eax, ecx we check the XCHG section in the manual. In the table there the instruction we want is XCHG EAX, r32. It's encoded as 90+rd (90 here is hexadecimal), where rd is a code that designates which double-word register is used.

Looking earlier in the manual (right at the beginning of the Instruction Set Reference chapters in the full PDF), we find the definition of +rd, and see that ECX has a value of 1. The compact single-byte encoding of xchg eax, ecx is therefore 91 (again, hex).

It is also possible to assemble this as 2 bytes, which is what one specific online assembler did for me, but the fact that one of the operands is EAX allows for the 1-byte version.

For mov ecx, 984 (I'm assuming exc is a typo) we check the MOV section, and find the instruction as MOV r32, imm32 in the table there, encoded as B8+rd id. From the other one, we already know that the rd for ECX is 1, so the first byte is B9.

Then we have id, and checking the same section where we found +rd, we know that that's a 4-byte immediate signed operand, given as low-order byte first (little endian). Converting 984 from decimal to hexadecimal, we get 3D8. Encoded in 4 bytes as little endian, this is D8 03 00 00.

Putting it together, the encoded instruction is B9 D8 03 00 00.

(Fun fact: x86 registers are numbered in EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI. The first 4 are not in alphabetic for what are probably obscure historical reasons.)

You can verify these by using any assembler, like NASM, and asking it to make a "listing". Like nasm -felf32 foo.asm -l /dev/stdout to print a listing on the terminal.

Thank you for the help. This explains the situation perfectly and I do appreciate the help — , Jul 28 '21 at 18:13

Give the opcode and number of bytes of code for each of these instructions

1 Answers1