4

I'm reading, in parallel, various books on computer architecture and I'm confused. Some book state that assembly instructions are just mnemonics for machine instructions, and each instruction corresponds to exactly one machine instruction. However, Tanenbaum's Structured Computer Organization puts assembly on the layer above the operating system, and seems to imply that assembly somehow uses the operating system (I haven't read the whole book yet...)

Which one is true? Are assembly instructions simply machine instructions? Can they be also be system calls which are interpreted by the OS to machine instructions? Can they be something else?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
blue_note
  • 27,712
  • 9
  • 72
  • 90
  • It depends on how you think about it. If you consider *only* user-mode programs, and think of system calls as opaque "magic" things, then sure, you could think of the assembly as relying on the OS. However, the machine language relies on the OS just as much as the assembly in that case. I do however want to note that assembly does not always map 1:1 with machine code. On some platforms, the same assembly *could* be assembled in multiple ways, though usually one is faster. – Thomas Jager Aug 01 '19 at 18:32
  • @ThomasJager: thanks. Could you provide an example of how machine language could rely on the OS? – blue_note Aug 01 '19 at 18:34
  • 1
    When the CPU executes the bytes `0F 05` (the assembly of [`syscall`](https://www.felixcloutier.com/x86/syscall)) on an x86_64 machine, it starts running OS code in a privileged mode. – Thomas Jager Aug 01 '19 at 18:35
  • @ThomasJager: thanks – blue_note Aug 01 '19 at 18:36
  • 1
    Depending on architecture, certain assembly instructions might have multiple machine code encodings and vica versa. Those are the exceptions though and it doesn't matter in everyday practice. – Jester Aug 01 '19 at 19:03
  • 1
    Assemblers are known to perform conversions on some instructions. For instance an assembler that targets 16-bit code on the 8086 couldn't emit the instruction `shl ax, 2` . On the 8086 you couldn't shift by more than 1 bit at a time so some assemblers would emit two `shl` instructions like `shl ax, 1` `shl ax, 1` (which is the same thing as `shl ax, 2` on processors >= 80186 which supported the enhanced form. – Michael Petch Aug 01 '19 at 19:08

1 Answers1

8

Mostly yes, one line of assembly corresponds to one CPU instruction. But there are some caveats.

Label definitions don't correspond to any instructions - they just mark up the memory so that you can refer to it elsewhere. Labels definitely don't correspond to instructions, even though under some assemblers they occupy separate lines.

Data directives like db 0x90 or .byte 0x90 manually assemble bytes into the output file. Using such directives in a region that will be reached by execution lets you manually encode instructions, or create bugs if you did that by accident.

Assemblers often support directives - lines that provide some guidance to the assembler itself. Those don't correspond to CPU instructions, and they can sometimes be mistaken for genuine commands.

Some assemblers support macros - think inline functions.


Some RISC assemblers, notably MIPS, have a notion of combined instructions - one line of assembly corresponds to a handful of instructions. (These are called pseudo-instructions.) Those are like built-in macros, provided by the assembler.

But depending on the operand, it might only need to assemble to 1 machine instruction. e.g. li $t0, 1 can assemble to ori $t0, $zero, 1 but li $t0, 0x55555555 needs both lui and ori (or addiu).

On ARM, ldr r0, =0x5555 can choose between a PC-relative load from a literal pool or a movw if assembling for an ARM CPU that supports movw with a 16-bit immediate. You wouldn't see ldr r0, =0x5555 in disassembly, you'd see whichever machine instruction(s) the assembler picked to implement it. (Editor's note: I'm not sure if any ARM assemblers will ever pick 2 instructions (movw + movk) for a wider constant for ldr reg, =value)


Do you count a procedure call as "multiple instructions per line"? There's CALL on Intel, BL on ARM. As far the CPU docs are concerned, those are single instructions. They're just branches that also store a return address somewhere.

But if you're debugging and stepping over function calls instead of into them, they invoke a procedure/function/subroutine that may contain arbitrarily many instructions. Same goes for syscalls: an instruction like syscall or svc #0 is basically a function call into the kernel.

Assembly programs can definitely consume services from the operating system. How do you think regular programs do that? Whatever a high level program can do, assembly can do also. The specifics vary though.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Seva Alekseyev
  • 59,826
  • 25
  • 160
  • 281