7

I've got to learn assembly and I'm very confused as to what the different registers do/point to.

Gabe
  • 84,912
  • 12
  • 139
  • 238
Amelie Windel
  • 71
  • 1
  • 2
  • EBP: http://stackoverflow.com/questions/579262/what-is-the-purpose-of-the-ebp-frame-pointer-register , ESP and EBP: http://stackoverflow.com/questions/5474355/why-does-leave-do-mov-esp-ebp-in-x86-assembly?rq=1 – Ciro Santilli OurBigBook.com Sep 20 '15 at 15:33

3 Answers3

11

On some architectures, like MIPS, all registers are created equal, and there is really no difference beyond the name of the register (and software conventions). On x86 you can mostly use any registers for general-purpose computing, but some registers are implicitly bound to the instruction set.

Lots of information about special purposes for registers can be found here.

Examples:

  • eax, accumulator: many arithmetic instructions implicitly operate on eax. There are also special shorter EAX-specific encodings for many instructions: add eax, 123456 is 1 byte shorter than add ecx, 123456, for example. (add eax, imm32 vs. add r/m32, imm32)
  • ebx, base: few implicit uses, but xlat is one that matches the "Base" naming. Still relevant: cmpxchg8b. Because it's rarely required for anything specific, some 32-bit calling-conventions / ABIs use it as a pointer to the "global offset table" in Position Independent Code (PIC).
  • edx, data: some arithmetic operations implicitly operate on the 64-bit value in edx:eax
  • ecx, counter used for shift counts, and for rep movs. Also, the mostly-obsolete loop instruction implicitly decrements ecx
  • esi, source index: some string operations read a string from the memory pointed to by esi
  • edi, destination index: some string operations write a string to the memory pointed to by edi. e.g. rep movsb copies ECX bytes from [esi] to [edi].
  • ebp, base pointer: normally used to point to local variables. Used implicitly by leave.
  • esp, stack pointer: points to the top of the stack, used implicitly by push, pop, call and ret

The x86 instruction set is a complex beast, really. Many instructions have shorter forms that implicitly use one register or another. Some registers can be used to do certain addressing while others cannot.

The Intel 80386 Programmer's Reference Manual is a irreplaceable resource, it basically tells you everything there is to know about x86 assembly, except for newer extensions and performance on modern hardware.

The PC Assembly (e)book is a great resource for learning assembly.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Aleksi Torhamo
  • 6,452
  • 2
  • 34
  • 44
  • I see paxdiablo updated his answer to contain a "little" more information while I was writing this, but I'll leave this here for the links. :) – Aleksi Torhamo Apr 21 '11 at 02:55
  • 1
    +1 for referring to the right CPU -- and for not losing the answer in a flood of other info. – cHao Apr 21 '11 at 03:07
  • 1
    Nice summary. +1. Would do business again. – JUST MY correct OPINION Apr 21 '11 at 03:36
  • 2
    Nowadays ecx isn't used as loop counter any more. No compiler does actually emit a loop instruction, as it is known to be very slow compared to a arithmetic/branch instruction pair. It has another special meaning as bit-index register for shift instructions or bit set/test/etc. – Gunther Piez Apr 21 '11 at 19:29
  • @drhirsch Yeah, but that's where the name "comes from", so.. And thanks for the info on the bit-index -use, I didn't know (or at least remember :) – Aleksi Torhamo Apr 22 '11 at 01:05
  • Remembering ECX as the "shift count" register still works. I wouldn't even mention the LOOP instruction if teaching someone asm. DEC/JNZ does the same thing, so just show them that as a "black box" until they understand what it does, and it doesn't force them into using ECX for a counter. – Peter Cordes Nov 02 '16 at 16:31
7

The sp register is the stack pointer, used for stack operation like push and pop.

The stack is known as a LIFO structure (last-in, first-out), meaning that the last thing pushed on is the fist thing popped off. It's used, among other things, to implement the ability to call functions.

The bp register is the base pointer, and is commonly used for stack frame operations.

This means that it's a fixed reference to locate local variables, passed parameters and so forth on the stack, for a given level (while sp may change during the execution of a function, bp usually does not).

If you're looking at assembly language like:

mov eax, [bp+8]

you're seeing the code access a stack-level-specific variable.

The si register is the source index, typically used for mass copy operations (di is its equivalent destination index). Intel had these registers along with specific instructions for quick movement of bytes in memory.

The e- variants are just the 32-bit versions of these (originally) 16-bit registers. And, as if that weren't enough, we have 64-bit r- variants as well :-)

Perhaps the simplest place to start is here. It's specific to the 8086 but the concepts haven't changed that much. The simplicity of the 8086 compared to the current crop will be a good starting point for your education. Once you've learned the basics, it will be much easier to move up to the later members of the x86 family.

Transcribed here and edited quite a bit, to make the answer self-contained.


enter image description here

GENERAL PURPOSE REGISTERS

8086 CPU has 8 general purpose registers, each register has its own name:

  • AX - the accumulator register (divided into AH/AL). Probably the most commonly used register for general purpose stuff.
  • BX - the base address register (divided into BH/BL).
  • CX - the count register (divided into CH/CL). Special purpose instructions for loping and shifting.
  • DX - the data register (divided into DH/DL). Used with AX for some MUL and DIV operations, and for specifying ports in some IN and OUT operations.
  • SI - source index register. Special purpose instruction to use this as a source of mass memory transfers (DS:SI).
  • DI - destination index register. Special purpose instruction to use this as a destination of mass memory transfers (ES:DI).
  • BP - base pointer, primarily used for accessing parameters and variables on the stack.
  • SP - stack pointer, used for the basic stack operations.

SEGMENT REGISTERS

  • CS - points at the segment containing the current instruction.
  • DS - generally points at segment where variables are defined.
  • ES - extra segment register, it's up to a coder to define its usage.
  • SS - points at the segment containing the stack.

Although it is possible to store any data in the segment registers, this is never a good idea. The segment registers have a very special purpose - pointing at accessible blocks of memory.

Segment registers work together with general purpose register to access any memory value. For example, if we would like to access memory at the physical address 12345h, we could set the DS = 1230h and SI = 0045h. This way we can access much more memory than with a single register, which is limited to 16 bit values.

The CPU makes a calculation of the physical address by multiplying the segment register by 10h and adding the general purpose register to it (1230h * 10h + 45h = 12345h):

1230
 0045
=====
12345

The address formed with 2 registers is called an effective address.

This usage is for real mode only (which is the only mode the 8086 had). Later processors changed these registers from segments to selectors and they are used to lookup addresses in a table, rather than having a fixed calculation performed on them.

By default BX, SI and DI registers work with DS segment register; and BP and SP work with SS segment register.

SPECIAL PURPOSE REGISTERS

IP - the instruction pointer:

  • Always points to next instruction to be executed.
  • Offset address relative to CS.

IP register always works together with CS segment register and it points to currently executing instruction.

FLAGS REGISTER

Determines the current state of the processor. These flags are modified automatically by CPU after mathematical operations, this allows to determine the type of the result, and to determine conditions to transfer control to other parts of the program.

Generally you cannot access these registers directly.

enter image description here

  • Carry Flag CF - this flag is set to 1 when there is an unsigned overflow. For example when you add bytes 255 + 1 (result is not in range 0...255). When there is no overflow this flag is set to 0.
  • Parity Flag PF - this flag is set to 1 when there is even number of one bits in result, and to 0 when there is odd number of one bits.
  • Auxiliary Flag AF - set to 1 when there is an unsigned overflow for low nibble (4 bits).
  • Zero Flag ZF - set to 1 when result is zero. For non-zero result this flag is set to 0.
  • Sign Flag SF - set to 1 when result is negative. When result is positive it is set to 0. (This flag takes the value of the most significant bit.)
  • Trap Flag TF - Used for on-chip debugging.
  • Interrupt enable Flag IF - when this flag is set to 1 CPU reacts to interrupts from external devices.
  • Direction Flag DF - this flag is used by some instructions to process data chains, when this flag is set to 0 - the processing is done forward, when this flag is set to 1 the processing is done backward.
  • Overflow Flag OF - set to 1 when there is a signed overflow. For example, when you add bytes 100 + 50 (result is not in range -128...127).
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • What was the rationale for choosing `e` and `r` for those prefixes? – Lightness Races in Orbit Apr 21 '11 at 01:55
  • 2
    @Tomalex: no idea. Perhaps `e` stood for "extended" and `r` stood for "really extended" :-) – paxdiablo Apr 21 '11 at 02:05
  • The 8086 doesn't have `ebp`, `esp`, or `esi`. No one mentioned the 8086. Why is this here? – cHao Apr 21 '11 at 03:06
  • 1
    @cHao, it's there as an introduction to the architecture. The simplest way to learn is with the simplest chip, since that will translate up to the latest ones relatively easily. The question itself indicates that the asker is at a basic level so I though it prudent to introduce them at that level. – paxdiablo Apr 21 '11 at 03:09
  • 1
    Actually, the simplest way to learn x86 is on an x86. The 8086 is useless for most people these days, and the difference between the two is significant enough that there's plenty of unlearning to do when you switch to a real CPU. – cHao Apr 21 '11 at 03:15
  • 1
    @cHao, with respect, that's utter rubbish. You don't start to learn about electronics by building a full-blown music synthesiser, you start with a simple project like a battery, switch and LED. Similarly, if you want to learn carpentry, you don't start by trying to build a house - rather you'd choose something like a table or a jewellery box. And why do you think beginning students are writing programs to average ten numbers, rather than putting together double-entry accounting packages? To _learn,_ you start with the basics. – paxdiablo Apr 21 '11 at 03:19
  • 1
    @paxdiablo: I wasn't saying anything about starting out by building an OS or anything like that. But these days, it's not worth learning to write 16-bit code unless you *are* building an OS. x86 *is* the basics. It isn't any more difficult than 8086 til you get into segments and paging and such, and you don't even have to worry about that stuff right away -- the OS takes care of the ugly details for you. And you don't end up learning about 8086 segmentation, which is nothing like how things are now. On top of that, 16-bit code won't even run everywhere anymore if you're not in a VM. – cHao Apr 21 '11 at 03:36
  • 2
    I have to partially agree with cHao, 16 bit code is useless nowadays even for learning purposes - if you want to learn assembly for some real world application. The whole segmentation stuff of the 8086 is obsolete and replaced by paging. The registers are now at least 32 bit, and a lot more addressing modes are allowed. So the 8086 is in no way _simpler_ than an x64. But it still has some purpose in showing students how to _not_ design a processor, as a negative example (the x64 isn't much better to be honest, but some of the really insane things are gone. Ever heard of the A20 Gate?) – Gunther Piez Apr 22 '11 at 08:59
  • It may well be an example of how to not design a CPU _now_ but, in its time, it was brilliant, allowing high levels of compatibility with earlier chips while expanding their capabilities. The A20 gate was also nothing to do with the CPU itself, it was a PC architecture thing. But again, it was an easy way to expand capabilities. – paxdiablo Dec 02 '11 at 22:44
  • 1
    The 8086 segment registers were just an extended addressing hack. The revised segment registers for the 286 and 386 were brilliant conversions of a hack into an idea designed to support modern, large-scale computing (as the MIT guys saw it [IMHO correctly] back in the 60s), even more relevant now. See my remarks on this at http://stackoverflow.com/a/10810340/120163. – Ira Baxter Jul 13 '13 at 16:45
  • 1
    The A20 gate was some engineer at IBM saving 25 cents on the original PC design to partition the address space into "RAM" and "System Stuff", and the resulting "640K" space probably cost the software engineering world billions of dollars. He could have decided to use a 8-input AND gate, offering the original PC programmers virtually all of the 1024K physical memory the x86 could address. I think IBM's motto was "Thimk" for a number of years, but apparantly people didn't thimk about what it meant. IBM doesn't use this motto anymore. Ooops. – Ira Baxter Jul 13 '13 at 16:50
1

Here's a simplified summary:

ESP is the current stack pointer, so you generally only update it to manipulate stack, and EBP is intended for stack manipulation too, for example saving the value of ESP before allocating stack space for local variables. But you can use EBP as a general purpose register too.

ESI is the Extended Source Index register, "string" (different from C-string, and I don't mean the type of C-string women wear either) instructions like MOVS use ESI and EDI.

Memory Addressing:

x86 CPUs have these special registers called "segment registers", each of them can point to different address, for example DS (commonly called data segment) may point to 0x1000000, and SS (commonly called stack segment) may point to 0x2000000.

When you use EBP and ESP, the default segment register used is SS, for ESI (and other general purpose registers), it's DS. For example, let's say DS=0x1000000, SS=0x2000000, EBP=0x10, ESI=0x10, so:

  mov eax,[esp] //loading from address 0x2000000 + 0x10
  mov eax,[esi] //loading from address 0x1000000 + 0x10

You can also specify a segment register to use, overriding the default:

  mov eax,ds:[ebp]

In terms of addition, subtraction, logical operations, etc, there's no real difference between them.

b_yang
  • 109
  • 6