If a register can be an operand for add
, or used in an addressing mode, it's "general purpose", as opposed to registers like the FS
segment register, or RIP. The GP registers are also called "integer registers", even though other kinds of registers can hold integers, too.
In computer architecture, it's common for CPUs to internally handle integer registers / instructions separately from FP/SIMD registers / instructions. e.g. Intel Sandybridge-family CPUs have separate physical register files for renaming GP integer vs. FP/vector registers. These are simply called the integer vs. FP register files. (Where FP is short-hand for everything that a kernel doesn't need to save/restore to use the GP registers while leaving user-space's FPU/SIMD state untouched.) Each entry in the FP register file is 256 bits wide (to hold an AVX ymm vector), but integer register file entries only have to be 64 bits wide1.
But when we say "integer register", we normally mean specifically a general-purpose register.
Note 1: Actually, a typical design is for integer PRF entries have room for a FLAGS result and/or a GP register, so maybe 70 bits. Since integer instructions also write FLAGS, it makes sense to keep them together instead of allocating from a separate table of tiny registers. (The register allocation table would then just have 2 extra entries, one for CF and one for the rest of the FLAGS, the SPAZO group, to record which PRF entry each part comes from.) On CPUs that rename segment registers (Skylake does not), I guess those would go in an integer PRF entry.
As far as the integer part of the architectural state of a user-space task that a kernel would save/restore on interrupts and system calls, that would include its RFLAGS and RIP. (And usually just not touch FP state.)
"General purpose" in this usage means "data or address", as opposed to an ISA like m68k where you had d0..7 data regs and a0..7 address regs, all 16 of which are integer regs. Regardless of how the register is normally used, general-purpose is about how it can be used.
Every register has some special-ness for some instructions, except some of the completely new registers added with x86-64: R8-R15. These don't disqualify them as General Purpose The (low 16 of the) original 8 date back to 8086, and there were implicit uses of each of them even in the original 8086.
For RSP, it's special for push/pop/call/ret, so most code never uses it for anything else. (And in kernel mode, used asynchronously for interrupts, so you really can't stash it somewhere to get an extra GP register the way you can in user-space code: Is ESP as general-purpose as EAX?)
But in controlled conditional (like no signal handlers) you don't have to use RSP for a stack pointer. e.g. you can use it to read an array in a loop with pop, like in this code-golf answer. (I actually used esp
in 32-bit code, but same difference: pop
is faster than lodsd
on Skylake, while both are 1 byte.)
Implicit uses and special-ness for each register:
See also x86 Assembly - Why is [e]bx preserved in calling conventions? for a partial list.
I'm mostly limiting this to user-space instructions, especially ones a modern compiler might actually emit from C or C++ code. I'm not trying to be exhaustive for regs that have a lot of implicit uses.
rax
: one-operand [i]mul / [i]div / cdq / cdqe, string instructions (stos), cmpxchg
, etc. etc. As well as special shorter encodings for many immediate instructions like 2-byte cmp al, 1
or 5-byte add eax, 12345
(no ModRM byte). See also codegolf.SE Tips for golfing in x86/x64 machine code.
There's also xchg
-with-eax which is where 0x90 nop
came from (before nop
became a separately-documented instruction in x86-64, because xchg eax,eax
zero-extends eax into RAX and thus can't use the 0x90
encoding. But xchg rax,rax
can still assemble to REX.W=1 0x90.)
rcx
: shift counts, rep
-string counts, the slow loop
instruction
rdx
: rdx:rax
is used by divide and widening-multiply (the one-operand forms), and cwd
/ cdq
/ cqo
to set up for idiv
. Also rdtsc
and BMI2 mulx
.
rbx
: 8086 xlatb
. cpuid
use all four of EAX..EDX. 486 cmpxchg8b
, x86-64 cmpxchg16b
. Most 32-bit compilers will emit cmpxchg8
for std::atomic<long long>::compare_exchange_weak
. (Pure load / pure store can use SSE MOVQ or x87 fild/fistp, though, if targeting Pentium or later.) 64-bit compilers will use 64-bit lock cmpxchg
, not cmpxchg8b.
Some 64-bit compilers will emit cmpxchg16b
for atomic<struct_16_bytes>
. RBX has the fewest implicit uses of the original 8, but lock cmpxchg16b
is one of the few compilers will actually use.
rsi
/rdi
: string ops, including rep movsb
which some compilers sometimes inline. (gcc also inlines rep cmpsb
for string literals in some cases, but that's probably not optimal).
rbp
: leave
(only 1 uop slower than mov rsp, rbp
/ pop rbp
. gcc actually uses it in functions with a frame pointer, when it can't just pop rbp
). Also the horribly-slow enter
which nobody ever uses.
rsp
: stack operations: push/pop/call/ret, and leave
. (And enter
). And in kernel mode (not user space) asynchronous use by hardware to save interrupt context. This is why kernel code can't have a red-zone.
r11
: syscall
/sysret
use it to save/restore user-space's RFLAGS. (Along with RCX to save/restore user-space's RIP).
Addressing-mode encoding special cases:
(See also rbp not allowed as SIB base? which is just about addressing modes, where I copied this part of this answer.)
rbp
/r13
can't be a base register with no displacement: that encoding instead means: (in ModRM) rel32
(RIP-relative), or (in SIB) disp32
with no base register. (r13
uses the same 3 bits in ModRM/SIB, so this choice simplifies decoding by not making the instruction-length decoder look at the REX.B bit to get the 4th base-register bit). [r13]
assembles to [r13 + disp8=0]
. [r13+rdx]
assembles to [rdx+r13]
(avoiding the problem by swapping base/index when that's an option).
rsp
/r12
as a base register always needs a SIB byte. (The ModR/M encoding of base=RSP is escape code to signal a SIB byte, and again, more of the decoder would have to care about the REX prefix if r12
was handled differently).
rsp
can't be an index register. This makes it possible to encode [rsp]
, which is more useful than [rsp + rsp]
. (Intel could have designed the ModRM/SIB encodings for 32-bit addressing modes (new in 386) so SIB-with-no-index was only possible with base=ESP. That would make [eax + esp*4]
possible and only exclude [esp + esp*1/2/4/8]
. But that's not useful, so they simplified the hardware by making index=ESP the code for no index regardless of the base. This allows two redundant ways to encode any base or base+disp addressing mode: with or without a SIB.)
r12
can be an index register. Unlike the other cases, this doesn't affect instruction-length decoding. Also, it can't be worked around with a longer encoding like the other cases. AMD wanted AMD64's register set to be as orthogonal as possible, so it makes sense they'd spend a few extra transistors to check REX.X as part of the index / no-index decoding. For example, [rsp + r12*4]
requires index=r12, so having r12
not fully generally purpose would make AMD64 a worse compiler target.
0: 41 8b 03 mov eax,DWORD PTR [r11]
3: 41 8b 04 24 mov eax,DWORD PTR [r12] # needs a SIB like RSP
7: 41 8b 45 00 mov eax,DWORD PTR [r13+0x0] # needs a disp8 like RBP
b: 41 8b 06 mov eax,DWORD PTR [r14]
e: 41 8b 07 mov eax,DWORD PTR [r15]
11: 43 8b 04 e3 mov eax,DWORD PTR [r11+r12*8] # *can* be an index
Compilers like it when all registers can be used for anything, only constraining register allocation for a few special-case operations. This is what's meant by register orthogonality.