What exactly is a machine instruction?

Question

The user's program in main memory consists of machine instructions and data. In contrast, the control memory holds a fixed microprogram that cannot be altered by the occasional user. The microprogram consists of microinstructions that specify various internal control signals for execution of register microoperations. Each machine instruction initiates a series of micro instructions in control memory. These microsinstructions generates microoperations to fetch the instruction for main memory; to evaluate the effective address, to execute the operation specified by the instruction, and to return control the fetch phase in order to repeat the cycle for the next instruction

I don't exactly understand here the difference between machine instruction, microinstruction and micropeerations. i certainly do understand that microinstructions according to the paragraph given are the intermediate level of instructions but which of the other 2 is the one that is more close to the machine language. Are CLA, ADD, STA, BUN, BSA, AND etc machine instructions or microoperations?

score 4 · Accepted Answer · answered Jan 27 '21 at 10:43

A CPU presents itself to the outside as a device capable of executing machine instructions. For example,

mov (%esi,%ebx,4), %edx

is a machine instruction that moves 4 bytes of data at address ESI+4*EBX into register EDX. Machine instructions are public - they are published by CPU manufacturer in a user manual. Compilers such as gcc will output files that contain machine instructions, and these will typically end up in EXE/DLL files.

If you look closely at the above instruction, you will see that it is a fairly complex operation. It involves some arithmetic (multiplying and addition) to get the memory address, then moving data from that address into a register. From CPU's perspective, it would also make sense to use the arithmetical unit that is already there. So it makes natural sense to break down this instruction into microinstructions. In essence, mov instruction is implemented internally by CPU as a microprogram written in microinstructions. This is, however, an implementation detail of a CPU. Microinstructions are internal to CPU and they are invisible to anybody except to CPU manufacturer.

Microinstructions have several benefits:

they simplify internal CPU architecture, design and testing, thus lowering cost per unit
they make it easy to create rich and powerful sets of machine instructions (you just have to combine microinstrcutions in different ways)
they provide a consistent machine language across different CPUs (e.g. Xeon and Pentium both implement basic x86_64 instruction set even though they are very different in hardware)
create optimizations (i.e. the same instruction on one CPU can be implemented by a hardware, the other can be emulated in microinstructions)
fix bugs (e.g. you can fix Spectre vulnerability while the machine is running and without buying a new CPU and opening your server)

For more information, see https://en.wikipedia.org/wiki/Micro-operation

One small nitpick, which might avoid further confusion: `mov (%esi,%ebx,4), %edx` is in itself just _a representation of_ a machine instruction, and the actual machine instructions we're talking about are just binary sequences. As I understand it, the text-based assembly language for human consumption is generally slightly more abstract than the real machine instructions - there won't be a single pattern of bits corresponding to "mov", but several related machine instructions with different addressing modes that are all written with that mnemonic in assembly language. — IMSoP, Jan 27 '21 at 11:45
@IMSoP: Indeed, there are multiple opcodes for `mov` (https://www.felixcloutier.com/x86/mov), but in this case only one of them is usable to encode that specific instruction. There are still 3 different possible encodings for it, though, using no displacement, 8-bit displacement = 0, or 32-bit displacement = 0. It's somewhat implied that the shortest one will be used, because that's what assemblers always do, and it's not `0(...)`. In other cases, like `mov %eax, %ebx`, there are 2 choices of opcode: [x86 XOR opcode differences](//stackoverflow.com/q/50336269) — Peter Cordes, Jan 27 '21 at 20:20
@PeterCordes Yes, I didn't mean that the assembly instruction was _ambiguous_ as such, just that "mov" on its own is a level of abstraction _above_ the kind of "machine instruction" we're discussing, and that some of the decisions of "what kind of `mov`" will be made in the assembler/compiler/interpreter, not in the micro-instructions of the processor. — IMSoP, Jan 27 '21 at 21:24
@IMSoP: 100% agreed, it would be better if this answer said `8B 14 9E` is an x86 machine instruction (human-readable asm version: `mov (%esi,%ebx,4), %edx` if it executes in 32-bit mode). My previous comment was agreeing with you and filling in the details on why it's ambiguous. — Peter Cordes, Jan 27 '21 at 21:27

score 3 · Answer 2 · answered Jan 27 '21 at 10:13

I think the answer to your question is in these three sentences:

The user's program in main memory consists of machine instructions and data

Each machine instruction initiates a series of micro-instructions in control memory.

These micro-instructions generate micro-operations.

So:

The user supplies machine instructions
Those get translated into micro-instructions
Those get translated into micro-operations

The mnemonics you mentioned are what the user might use to write or read a list of machine instructions (the actual instructions just being patterns of bits understood by the processor). The "occasional user" (i.e. everyone other than the chip's designer) never needs to deal directly in micro-instructions or micro-operations, so would never know individual names for them.

What exactly is a machine instruction?

2 Answers2