x86 is an architecture derived from the Intel 8086 CPU. The x86 family includes the 32-bit IA-32 and 64-bit x86-64 architectures, as well as legacy 16-bit architectures. Questions about the latter should be tagged [x86-16] and/or [emu8086]. Use the [x86-64] tag if your question is specific to 64-bit x86-64. For the x86 FPU, use the tag [x87]. For SSE1/2/3/4 / AVX* also use [sse], and any of [avx] / [avx2] / [avx512] that apply
The x86 family of CPUs contains 16-, 32-, and 64-bit processors from several manufacturers, with backward-compatible instruction sets, going back to the Intel 8086 introduced in 1978.
There is an x86-64 tag for things specific to that architecture, but most of the info here applies to both. It makes more sense to collect everything here. Questions can be tagged with either or both. Questions specific to features only found in the x86-64 architecture, like RIP-relative addressing, clearly belong in x86-64
. Questions like "how to speed up this code with vectors or any other tricks" are fine for x86
, even if the intention is to compile for 64bit.
Related tag with tag-wikis:
- sse wiki (some good SIMD guides), and avx (not much there)
- inline-assembly wiki for guides specific to interfacing with a compiler that way.
- intel-syntax wiki and att wiki have more details about the differences between the two major x86 assembly syntaxes. And for Intel, how to spot which flavour of Intel syntax it is, like NASM vs. MASM/TASM.
Learning resources
Matt Godbolt's CppCon2017 talk “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” has a gentle introduction to x86 asm itself for asm beginners who know C or C++, as well a very useful guide to looking at compiler output.
If you don't know how to do something in asm, write a simple C function that does it and see what an optimizing compiler does. e.g.
int foo(char *p) { return *p; }
shows you how to usemovsx
. See also How to remove "noise" from GCC/clang assembly output?Short x86 Assembly Guide targetting 32 bit mode and MASM assembler, but being brief and target-agnostic enough to be used as a starting point for any "Intel" syntax dialect assembler (NASM, YASM, FASM, ...).
Suggestions on how to learn asm, with a recommendation against 16bit DOS. Questions should use the x86-16, emu8086, and/or dos tags if applicable, as well as x86 (which includes all platforms.)
OSdev.org: a great resource if you want to understand / modify OS internals or make your own toy OS. Not useful for writing / debugging normal programs that run under existing OSes.
General Tips for Bootloader Development. (Using legacy BIOS, not UEFI).
Working example of a legacy BIOS
int 10h
bootloader that loads a "kernel" and calls a Cmain
function in it, in 32-bit protected mode. Includes instructions on how to build and link it with NASM,gcc -m32
, andld
(with a linker script). And how to make a disk image and run it on QEMU.the inline-assembly tag wiki. (But see also https://gcc.gnu.org/wiki/DontUseInlineAsm - inline asm is more complicated than writing stand-alone asm functions you call from C, so it's not good for learning asm.)
Using GNU C/C++ inline ASM. The bottom of that answer has a collection of links to info on how to write inline asm that's efficient and correct. The first part of the answer explains why it's not a good way to learn asm in the first place. Don't try to "get your feet wet" with asm by using inline asm. You have to understand everything to write correct input/output operand constraints and clobbers.
Understanding Carry vs. Overflow conditions/flags, normally relevant for unsigned vs. signed respectively.
Style guide: indenting columns for labels / instructions / operands / comments: a Code Review.SE answer: https://codereview.stackexchange.com/questions/204902/checking-if-a-number-is-prime-in-nasm-win64-assembly/204965#204965
- Quick guide to what's different in x86-64. AT&T syntax. NASM and YASM behave differently (from each other) in choice of encoding for
mov rax, 1
, and don't use a separatemovabs
mnemonic for the 64bit-immediate form. - Introduction to x64 Assembly (PDF published by Intel). Uses MASM syntax. Spends a bit of time talking about the Windows calling convention and / MSVC-specific toolchain issues (like no MSVC inline asm in 64-bit mode), as you might expect from using "x64" in the article title instead of x86-64. But looks like some good generally-applicable stuff that isn't OS-specific. For some bizarre reason, it suggests using the slow LOOP instruction, so it's not perfect.
- A NASM tutorial for x86-64 Linux (
nasm -felf64
) and MacOS (nasm -fmacho64
). Includes some basic SIMD stuff, but forgets to usealignas(16)
on the C arrays that require alignment, and usesmovaps
with integer,movdqa
with float. (Which is not a correctness problem, and on most CPUs probably not a performance problem, but is backwards.) Otherwise mostly looks good. - Encoding Real x86 Instructions: a tutorial (course material) on how instructions are encoded into machine code. Lots of diagrams.
- x86 on Wikipedia
- x86 Assembly wikibook
- Assembly Language for x86 Processors (website for Kip Irvine's book)
- Programming from the Ground Up, a free (GFDL) book by Jonathan Bartlett. Errata for the book. Available as a small (1MB) PDF from the "download" link on that page, or as HTML chapters . It uses 32-bit x86 asm with AT&T syntax on Linux, and has some good stuff about how to "think like a computer" to figure out how to get things done in asm. It covers some essential operating-system stuff like virtual memory, and things like that necessary to understand what's going on, as well as assembly / machine language itself.
- x86-64 Assembly Language Programming with Ubuntu, a free book using YASM (NASM syntax) for GNU/Linux. The PDF is CC-BY-NC-SA. Unfortunately no mention of
default rel
or[rel x]
RIP-relative addressing so it's missing some stuff that's essential in practice. But does have some introductory stuff about basics like data representation, bits and bytes in memory vs. registers, and other background beyond just what each instruction does. - 8086 assembler tutorial for beginners - emu8086 (MASM/TASM style) 16-bit only, but starts out with some nice intro stuff about hex vs. decimal, what assembly language is, what registers are and how memory is addressed, and how to look at memory in the debugger, before jumping into how specific instructions work.
- Assembly tutorial - Dr. Paul Carter
- Windows Assembly Programming Tutorial
- Why do functions have to save some registers, but not others? See below for links to guides & docs for specific calling conventions.
- How to trace what a function does: figure out the inputs and the outputs, then figure out what it does with them.
- Linux x86 Program Start Up or - How the heck do we get to main()
- A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
- What do the register-names like
esi
mean, and what special purposes do they have. They're all acronyms, like Counter register, or Source Index.
Guides for performance tuning / optimisation:
- Agner Fog's optimization guides and resources. Includes latency/throughput tables for P5 onwards. Also much qualitative discussion of how to go about making your code faster. Also has a good guide to the different calling conventions across OSes, and covers linking / symbols / relocation.
- Intel's Sandybridge microarchitecture family can't micro-fuse indexed addressing modes in the out-of-order core, only in the decoders and uop-cache. Also: Haswell's dedicated store-address unit on port7 only works with simple effective addresses. Complex effective addresses need the AGU on a load port.
- Enhanced REP MOVSB for memcpy: single-threaded bandwidth vs. aggregate bandwidth on desktop vs. many-core CPUs, RFO vs. non-RFO stores. (Modern CPUs have more DRAM / L3 bandwidth than a single core can use; there are other bottlenecks especially in many-core chips).
- What Every Programmer Should Know About Memory by Ulrich Drepper. (Originally posted as a series of LWN articles, Ulrich published the PDF later). How DRAM and caches work, their behaviour, and how to optimize software for cache locality. Includes some charts with real microbenchmark data to illustrate points, and a cache-blocked SSE2 matrix multiply example. See a 2017 review of what's outdated, e.g. the P4 software prefetch stuff is mostly obsolete.
- Why
xor same,same
is better thanmov reg, 0
for zeroing a register There are several reasons, some simple and some subtle (e.g. avoiding partial-register stalls on P6/SnB family). - Serializing RDTSC with LFENCE vs. CPUID for benchmarking short sequences within a program.
- How to get the CPU cycle count in x86_64 from C++? (including a bunch of info on what
rdtsc
measures, exactly, and caveats for using it, with links to even more details). - What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?: intro to static performance analysis.
- Intel's IACA (Intel Architecture Code Analyzer): analyze marked sections of code for throughput (e.g. cycles per iteration) or latency of the critical path. Assumes perfect cache, and other simplifications, and isn't always correct, but can be useful. Was stalled, but updated again for Skylake-X (AVX512). See What is IACA and how do I use it? for a tutorial.
- uiCA (uops.info Code Analyzer) is like IACA but with an accurate model of the front-end fetch/pre-decode/decode (and uop cache or LSD if applicable, I assume) not just 4-wide or 5-wide issue that IACA assumes. See Do 32-bit and 64-bit registers cause differences in CPU micro architecture? for an example output graph.
- Haswell microarchitecture, Bulldozer microarchitecture. David Kanter's analysis. He's also done writeups on earlier uarches, like Sandybridge and Nehalem.
- Modern Microprocessors A 90-Minute Guide!: from in-order pipelined to super-scalar out-of-order. And brainiac (PPro) vs. speed demon (Pentium 4), and Pentium 4 hitting the "power wall" in CPU design.
- A whirlwind introduction to dataflow graphs: how to analyze dependency chains for throughput and latency.
- http://www.uops.info/ very detailed uop / execution port testing on Intel CPUs, finding some things that repeating a large block of the same instruction (like Agner Fog's testing) sometimes misses.
- New CPUs will usually have AIDA64 InstLatx64 results before Agner Fog can test and publish updated tables. For example, Skylake-avx512, and see also https://github.com/InstLatx64/InstLatx64 for a mirror + a spreadsheet of Skylake-AVX512 port assignments (compiled from IACA-2.3 output). BDW vs. SKL points out some of the interesting changes in SKL (more throughput for more instructions, different FP latency).
- 2015 IDF slides from the Skylake power management talk Unfortunately the main site (http://myeventagenda.com/sessions/0B9F4191-1C29-408A-8B61-65D7520025A8/7/5) which had video (of slides + audio) is offline now.
Instruction set / asm syntax references:
Intel's vector intrinsics finder/search (very good): search by asm mnemonic or C intrinsic name
x86/x64 SIMD Instruction List (SSE to AVX512) Beta: A nice compact table listing instruction mnemonics and their intrinsics, broken down by type and element-size. Detailed pages with graphical data-movement diagrams for each instruction.
SIMD guides in the SSE tag wiki, focusing on how to actually make good use of SIMD in general, not just what the available instructions are.
Intel's manuals, including instruction set reference manual. Extremely detailed description of everything every instruction does to the architectural state. Big, but has a decent index / table of contents. Also on that page: Intel's optimization manual. Some of the same advice as Agner Fog's guides, but sometimes without explaining exactly why in terms of microarch execution ports and other under-the-hood reasons. Also sometimes obsolete, for example recommending against
inc
/dec
long after P4 is irrelevant.AMD's x86 manuals, including instruction-set reference and optimization manuals.
HTML version of Intel's insn set reference, auto-generated from the PDF. One page per instruction, great for linking in answers.
Another HTML extract, including AVX512, CLFLUSHOPT, etc.. This makes it more cluttered, and harder to find what you need, if you're not targeting AVX512. (But note that CLFLUSH has changed to being strongly-ordered, but felixcloutier.com's HTML extract still has the old documentation. There may be other inaccuracies in the old docs, even for old instructions.)
https://sandpile.org - CPUID maps, instruction encoding, register diagrams, opcode map, miscellaneous other technical details.
x86 Instruction Reference including when introduced (8086, 186, 586, etc) - NASM appendix B. Includes undocumented instructions, and Cyrix-only MMX instructions, and stuff like that.
A fork of an older version includes English descriptions. The original had some errors in which generation introduced each form of each insn but this version keeps the nice formatting while fixing those. Handy for people still developing for x86-16. The similar wikipedia page doesn't mention that 386 is required for the faster 2-operand form of
imul r16, r/m16
that doesn't have to calculate the upper half of the result.x86 Opcode reference guide, sorted by opcode or by mnemonic. 32, 64, or both in one table. The "geek" version includes non-standard / undocumented opcodes, the "coder" one includes columns showing which if any flags are read and written.
Original 8086 errata / anomalies, such as
mov ss, src
not properly disabling interrupts until the end of the next instruction. Also see the parent directory for some errata, undocumented instructions, and stuff for 186/286/386.Simply FPU: x87 tutorial. Helpful for understanding old x87 code, esp. the early sections about how the register stack works. (Use SSE for new code.)
fsin
's precision is far worse than 1ulp for inputs close to pi, contrary to Intel's previous documentation. The other FP articles in Bruce Dawson's series are also excellent (index in this one on FP comparisons).YASM manual: describes YASM syntax and macros. Excellent register diagram showing partial registers, with their machine-code encodings, and a reminder on zero-extending vs. unmodified upper parts. (Another simpler register-subset diagram for a single reg).
Possible canonical duplicates for register subsets: Assembly registers in 64-bit architecture includes some calling-convention / usage stuff. How do AX, AH, AL map onto EAX? is a good one for bugs where AL and RAX were used for different things, corrupting each other.
MASM Reference Documentation, and an old MASM 6.1 manual from 1996. Confusing brackets in MASM32 shows that MASM surprisingly ignores brackets around symbolic immediates.
MASM syntax as used by JWasm. JWasm is a portable assembler.
table of AT&T(GNU) vs. NASM syntax for addressing modes and indirect
jmp
/call
All the available addressing modes (32/64-bit) (Intel syntax, with a note about NASM vs. MASM for
mov reg, symbol
), with links to further guides.TODO: find a good link for AMD's XOP instruction set. (Not recommended for general use; even AMD is dropping XOP support in their Zen architecture.)
OS-specific stuff: ABIs and system-call tables:
- x86 ABIs (wikipedia): calling conventions for functions, including x86-64 Windows and System V (Linux). See also Agner Fog's nice calling convention guide
- 32-bit absolute addresses no longer allowed in x86-64 Linux? (PIE executables are now the default on most distros, with gcc configured with
--enable-default-pie
.) - Mach-O 64-bit format does not support 32-bit absolute addresses. NASM Accessing Array (OS X's image base is above the low 32, unlike Linux position-dependent executables). Also mentions 2 known bugs in some NASM versions with macho64 and RIP-relative or 64-bit absolute addressing.
- System V ABI summary on osdev: i386 and x86-64, with links to random copies of the per-architecture supplement for various architectures, and the generic gABI that all the processor-specific supplement (psABI) documents expand on.
- System V psABI official standard current revisions for x86-64 and i386 (wiki page on github, kept up to date by H.J. Lu). Direct link to x86-64 revision 1.0. Also links to the official forum for ABI discussion by maintainers/contributors.
- clang/gcc sign/zero extend narrow args to 32bit, even though the System V ABI as written doesn't (yet?) require it. Clang-generated code also depends on it.
- System V 32bit (i386) psABI (official standard, rev 1.1 Dec2015), used by Linux and Unix. (Some OSes don't require 16-byte stack alignment for 32-bit code; GNU/Linux does)
(Historical: very old SCO version of the i386 SysV ABI, before 16B stack alignment was required).
- OS X 32bit x86 calling convention, with links to the others. The 64bit calling convention is System V. Apple's site just links to a FreeBSD pdf for that.
Windows x86-64
__fastcall
calling conventionWindows
__vectorcall
: documents the 32bit and 64bit versionsWindows 32bit
__stdcall
: used used to call Win32 API functions. That page links to the other calling convention docs (e.g.__cdecl
).ABI cheat sheet: x86 vs. x64 vectorcall and non-vectorcall, vs. SysV. SysV section is incomplete.
Why does Windows64 use a different calling convention from all other OSes on x86-64?: some interesting history, esp. for the SysV ABI where the mailing list archives are public and go back before AMD's release of first silicon.
MSVC's 32bit CRT startup code sets the x87 FPU precision to 53 (
double
). That entire series of articles (table of contents in this one) is excellent, including asm output from MSVC in some examples.
- The Definitive Guide to Linux System Calls (on x86). Examples of how to use
int 0x80
, 32-bitsysenter
, and 64-bitsyscall
, and how to call through the vDSO forgettimeofday
, and has some info about glibc's syscall wrappers. Lots of details, and also some background info / basics for beginners. - Linux system call tables. 64bit syscall numbers, with parameter->register mapping (derived from the kernel source code, and the standard rule for order of args).
- FreeBSD system calls: question has FreeBSD syscalls, answer has Linux and others.
- What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64: Note that 32bit
int 0x80
restores all registers (including flags) excepteax
, while 64bitsyscall
also clobbersrcx
andr11
as well as putting the return value inrax
.
- 16bit interrupt list: PC BIOS system calls (
int 10h
/int 16h
/ etc, AH=callnumber), DOS system calls (int 21h
/AH=callnumber), and more.
memory ordering:
- Weak vs. Strong Memory Models: what it means when people say x86 has a "strongly ordered memory model". See also the c++ info page for many good links if you're using C11/C++11 atomics.
- Memory Reordering Caught in the Act: A test case that demonstrates memory reordering in practice on a multicore x86 CPU.
- A better x86 memory model: x86-TSO (extended version) A formal definition of the x86 memory model which hopefully matches how real hardware behaves.
- Why isn't
add dword [num], 1
atomic, even though it's a single instruction. Also asks about compilingnum++
in C++. or See also Atomicity on x86: What does it mean for a load or store to be atomic, and how is it implemented internally?
Specific behaviour of specific implementations
- TLB and Pagewalk Coherence in x86 Processors. Many x86 microarchitectures, especially Intel's, provide stronger ordering guarantees than the ISA requires for modifying a page-table entry that's not already cached in the TLB. Win95 even depended on this. (Don't write new code that depends on this.)
- Measuring Reorder Buffer Capacity Another experimental test that demonstrates the capabilities and limits of out-of-order execution in real hardware.
- What are the exhaustion characteristics of RDRAND on Ivy Bridge? With an answer from David Johnston (Intel RNG HW designer and
librdrand
author).
Q&As with good links, or directly useful answers:
Using GNU C/C++ inline ASM. (Same link from the learning-resources section, but worth repeating here.)
What are the best instruction sequences to generate vector constants on the fly?
Deoptimizing a program for the pipeline in Intel Sandybridge-family CPUs. Has a long answer including some introductory computer-architecture stuff as well as details of what can stall a Haswell pipeline.
How can I run this assembly code on OS X?: OS X getting-started guide. (Symbol names are prepended with
_
on OS X, unlike for Linux ELF systems.)add/sub/LEA can be used with garbage in high bits, so
LEA eax, [rdi + rsi*2 - 15]
to computea + 2*b - 15
works fine, even ifa
andb
are only supposed to be 8 or 16 bits.TODO: find a question about how to use a profiler to measure uops and stuff.
perf
comes with most Linux distros, andocperf.py
is a wrapper for it that provides more symbolic names for stuff like micro-arch-specific uop counters.
FAQs / canonical answers:
If you have a problem involving one of these issues, don't ask a new question until you've read and understood the relevant Q&A.
(TODO: find better question links for these. Ideally questions that make a good duplicate target for new dups. Also, expand this.)
My program crashes / segfaults: You need to use a debugger to find what instruction is crashing (see the bottom of this tag wiki for GDB and Visual Studio tips). Most buggy asm programs crash, so without more info this is not useful. Reasons can include clobbering registers or stack memory you shouldn't have, leaving
esp
pointing to the wrong place before aret
, or many many other reasons besides the following other common problems.external assembly file in visual studio - VS mixed-source x64 project, for asm files as part of a C/C++ program.
Also Assembly programming - WinAsm vs Visual Studio 2017 for a pure asm project.Building 32bit code on a 64bit system (with the GNU toolchain).
gcc example.s
makes a binary that runs in 64bit mode, which will crash if the code was written for 32bit mode. Related: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.Building an executable from asm source that defines
_start
vs. source that definesmain
, withgcc
/as
/ld
and/or NASM. With or without libc, and static vs. dynamic executable.Wide load on narrow data loading or modifying extra bytes, e.g.
mov eax, [var]
from adb 0
.ret from
_start
segfaults without making a Linux_exit
syscall.ret
doesn't work because it's not a function. What happens if there is no exit system call in an assembly program? also covers the case of falling off the end with noret
.Execution just keeps going if there's no jump or ret, falling through to what's next: What if there is no return statement in a CALLed block of code in assembly programs and Why is no value returned if a function does not explicity use 'ret'.
Code executes condition wrong? fall through from the
if
into theelse
body in anif/else
. Nicely explains that labels aren't magic and execution falls through them.Segmentation fault when using DB (define byte) inside a function Putting data where it's executed as code. (Assembly (x86): <label> db 'string',0 does not get executed unless there's a jump instruction for legacy BIOS bootloaders with data at the top.)
idiv
/div
problems: Zeroedx
first, or sign-extendeax
into it.. 32-bitdiv
faults with #DE if the 64b/32b => 32b quotient doesn't actually fit in 32b. (On POSIX systems including Linux, this raisesSIGFPE
).8-bit operand size like
div dl
is the special case where dx isn't involved, just AX and AH/AL. It still faults if the quotient overflows 8 bits.No output from
printf
when I pipe the output, or print something without a newline? When you use the exit system call.Calling printf in x86_64 using GNU assembler calling convention, stack alignment, and working example. Related NASM-syntax version Segfault while calling C function (printf) from Assembly
Canonical duplicate for scanf segfaulting on misaligned stack in modern Linux builds of glibc: glibc scanf Segmentation faults when called from a function that doesn't align RSP
Library functions modify registers / which registers do my functions need to save and restore? This is specified by the calling convention (part of the ABI) for the platform you're targeting. Search for those terms on this page. What registers must be preserved by an x86 function? is a decent canonical duplicate.
mismatched push/pop: if the stack pointer isn't pointing at the return address when you
ret
, you crash.How do I handle multi-digit numbers? Linux, Windows, OS X, and DOS system calls for handling user input/output give you ASCII (or UTF-8) characters, or strings of characters. (Canonical Q&A for single-digit failure to do
sub al, '0'
). You normally need to convert between strings and binary integers to do math on them, like the C functionsatoi
orsprintf(buf, "%d", number)
. None of the common system-call APIs for major OSes that run on x86 provide these functions for you; only as libraries.string-to-integer (32-bit NASM, algorithm works everywhere). (multiply by 10 for place value) Also includes an int-to-string loop.
Printing integers: 16-bit code to print 16 or 32-bit integers (in
dx:ax
) (1 digit at a time with MS-DOSint 21h
, but could be adapted to store into a string or use a different output method.) Another example for unsigned 16b numbers in DOS that calculates digits and stores them into a string in memory.2-digit decimal numbers (00-99), using BIOS
int 10h
for each digit: Displaying Time in Assembly. (Just a special case of the general algorithm, not looping.)NASM x86-64 function to convert and print a 32-bit unsigned integer (using a single Linux
write
system call on a buffer). Other answers on the same question show printing one character at a time. AT&T version of the same function, also showing a 5x faster version that uses a multiplicative inverse instead ofdiv
to divide by the compile-time constant 10.How to convert a binary integer number to a hex string? (32-bit NASM code. Scalar, SSE2, SSSE3, AVX512F, and AVX512VBMI versions.)
Loading pointers into registers vs. loading data into registers: Make sure you understand the different between
mov reg, symbol
andmov reg, [symbol]
(NASM syntax), or MASM syntax:mov reg, OFFSET symbol
vs.mov reg, symbol
. Many beginner questions are caused by mistakes in dereferencing addresses, or not dereferencing. This is the same as pointers in C.Invalid combination of opcode and operands error on
mov [msg], [ebp+8]
? You can't use two memory operands to one instruction. (Why IA32 does not allow memory to memory mov?)Bit-shifts and rotates need the count in
cl
, not any other register, or as an immediate constant.shl eax, ebx
is impossible,shl eax, 2
is fine, and so isshl eax, cl
Call an absolute pointer in x86 machine code or
jmp
to an absolute address. With examples in NASM and AT&T syntax.Why do most x86-64 instructions zero the upper part of a 32 bit register? In fact, all instructions that write a 32bit register zero the upper 32 of the full 64bit register, so
mov eax, 1234
is more efficient thanmov rax, 1234
, but equivalent. This is not the case for writing to 8 and 16bit registers, likeal
/ah
/ax
, so you needmovzx
ormovsx
if the upper bits might hold garbage and you need to clear them (e.g. before using as part of a memory address).Using LEA on values that aren't addresses / pointers? It's just a shift-and-add ALU instruction that uses memory-operand syntax and machine encoding.
How to tell the length of an x86 instruction? – with an overview over the x86 instruction encoding
Reversing a string? This well-commented answer uses 16-bit ms-dos system calls to read the string, but the actual loop over the string works the same for 32 or 64-bit code.
Indexing an array without scaling the index by the element width, resulting in overlapping loads or stores. Declaring and indexing an integer array of qwords in assembly (x86-64 AT&T syntax)
boot loader works in QEMU but not on real hardware – real computers some times expect the MBR to have a BPB (BIOS parameter block). If the BPB is missing or wrong, the BPB area in the MBR is overwritten with “correct” values, corrupting your boot loader.
How do I do X in assembly: usually the same way you would in another programming language, like C. Figure out what needs to happen to the data before you get bogged down in writing instructions to make it happen.
How to get started / Debugging tools + guides
Find a debugger that will let you single-step through your code, and display registers while that happens. This is essential. We get many questions on here that are something like "why doesn't this code work" that could have been solved with a debugger.
On Windows, Visual Studio has a built-in debugger. See Debugging ASM with Visual Studio - Register content will not display. And see Assembly programming - WinAsm vs Visual Studio 2017 for a walk-through of setting up a Visual Studio project for a MASM 32-bit or 64-bit Hello World console application.
On Linux: A widely-available debugger is gdb. See Debugging assembly for some basic stuff about using it on Linux. Also How can one see content of stack with GDB?
There are various GDB front-ends, including GDBgui. Also guides for vanilla GDB:
With layout asm
and layout reg
enabled, GDB will highlight which registers changes since the last stop. Use stepi
to single-step by instructions. Use x
to examine memory at a given address (useful when trying to figure out why your code crashed while trying to read or write at a given address). In a binary without symbols (or even sections), you can use starti
instead of run
to stop before the first instruction. (On older GDB without starti
, you can use b *0
as a hack to get gdb to stop on an error.) Use help x
or whatever for help on any command.
GNU tools have an Intel-syntax mode that's similar to MASM, which is nice to read but is rarely used for hand-written source (NASM/YASM is nice for that if you want to stick with open-source tools but avoid AT&T syntax):
clang
orgcc -Wall -O3 -masm=intel foo.c -fverbose-asm -S -o- | less
(affects inline-asm)- GDB:
set disassembly-flavor intel
(can go in your~/.gdbinit
) objdump -drwC -Mintel
perf report -Mintel
Another key tool for debugging is tracing system calls. e.g. on a Unix system, strace ./a.out
will show you the args and return values of all the system calls your code makes. It knows how to decode the args into symbolic values like O_RDWR
, so it's much more convenient (and likely to catch brain-farts or wrong values for constants) than using a debugger to look at registers before/after an int
or syscall
instruction. Note that it doesn't work correctly on Linux int 0x80
32-bit ABI system calls in 64-bit processes: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?.
To debug boot or kernel code, boot it in Bochs, qemu, or maybe even DOSBox, or any other virtual machine / simulator / emulator. Use the debugging facilities of the VM to get way better information than the usual "it locks up" you will experience with buggy privileged code.
Bochs is generally recommended for debugging real-mode bootloaders, especially ones that switch to protected mode; Bochs's built-in debugger understands segmentation (unlike GDB), and can parse a GDT, IDT, and page tables to make sure you got the fields right.
For DOS programs, see the x86-16 tag wiki for debuggers that run inside the guest, and thus can debug a specific DOS program maybe more easily than Bochs for the whole system.
REPL (Read Eval Print Loop) environments for typing an instruction and seeing what it does to register values. Maybe only useful for user-space, perhaps not osdev stuff.