Differences between general purpose registers in 8086: [bx] works, [cx] doesn't?

Question

In 8086 this structure is correct:

mov bh,[bx]

but this is not correct:

mov bh,[cx]

I don't know why. I think that the general purpose registers (AX, BX, CX, DX, SP, BP, SI and DI) are registers that we can use for any purpose and the statement that BX is for base address or CX is for counter is just a convention and they don't differ at all. But it seems that I'm wrong. Can you explain the reason? And what is the exact difference between these registers? (For example why can't I save the base address in cx register?)

@zx485 Nope. That answer does not explain the actual problem, namely, the set of possible addressing modes on the 8086. — fuz, Dec 20 '18 at 10:35
Related: [Referencing the contents of a memory location. (x86 addressing modes)](https://stackoverflow.com/q/34058101) for 32 and 64-bit addressing modes. — Peter Cordes, Jun 23 '21 at 07:20

score 9 · Accepted Answer · edited Oct 24 '22 at 10:52

On the 8086 (and 16-bit addressing in x86), only addressing modes of the form
[bp|bx] + [si|di] + disp0/8/16 are available. Listing them all:

[bx]       [bx + foo]
[foo]      [bp + foo]
[si]       [si + foo]
[di]       [di + foo]
[bx + si]  [bx + si + foo]
[bx + di]  [bx + di + foo]
[bp + si]  [bp + si + foo]
[bp + di]  [bp + di + foo]

where foo is some constant value, e.g. 123 or the offset of a symbol within a segment, e.g. a literal foo to reference a foo: label somewhere.
(Fun fact: the only way to encode [bp] is actually as [bp+0], and assemblers will do this for you. Notice in the table [foo] is where [bp] would otherwise be; this reflects how x86 machine code special-cases that encoding to mean displacement with no registers.)

bp as the base implies the SS (stack) segment; other addressing modes imply the DS (data) segment. This can be overridden with a prefix if necessary.

Note that no addressing mode involving cx exists, so [cx] is not a valid memory operand.

The registers ax, cx, dx, bx, sp, bp, si, and di are called general purpose registers because they are accessible as operands in all general-purpose instructions. This is in contrast to special-purpose registers like es, cs, ss, ds (segment registers), ip (the instruction pointer) or the flags register which are only accessible with special instructions made just for this purpose.

As you see, not all general purpose registers can be used as index registers for memory operands. This has to be kept in mind when registrating your code.

In addition to this restriction, there are some instructions that implicitly operate on fixed registers. For example, the loop instruction exclusively operates on cx and a 16-bit imul r/m16 operates exclusively on dx:ax. If you want to make effective use of these instructions, it is useful to keep each general purpose register's suggested purpose in mind.

Notably, lods / stos / scas / movs / cmps use DS:SI or/and ES:DI implicitly, and on cx when used with a rep or repz / repnz prefix, so those registers for looping a pointer over an array allow code-size optimizations.

Note that if bp is used as a base register, the default segment register is ss instead of ds. — rcgldr, Dec 20 '18 at 14:32
Your right hand side expressions all contain `foo` but none of the left hand sides, except one do. I will correct the LHS to be what I think it should be but change it if my interpretation is wrong. — FreelanceConsultant, Sep 07 '21 at 12:40
@FreelanceConsultant The answer was correct the way it was. `[bx]` and [bx + foo]` are two separate addressing modes. It's a two column table where the left hand side has addressing modes without displacement (except for `[bp]` which doesn't exist; the addressing mode instead encodes an absolute address) and the right hand side has addressing modes with displacement. Do not break what you don't understand. — fuz, Sep 07 '21 at 12:43
@fuz Ok I understand - it's not a table but a list across two columns. When I initially saw it I was confused as it looked like one column was some kind of equivalent of the other. — FreelanceConsultant, Sep 07 '21 at 12:58
@FreelanceConsultant It is a table of two columns. The left hand side has addressing modes without displacements, the right hand side has the corresponding addressing modes with displacements. Except for `[bp + foo]` as explained in my previous comment. — fuz, Sep 07 '21 at 12:59
@FreelanceConsultant The new changes do not make the answer any better. For one, wheter you can write `[bx+si]` as `[bx][si]` depends on the assembler. There number of 17 addressing modes is very questionable, too. Overall I do not like this improvement and have reverted it. Please write an answer on your own if you think this is important. — fuz, Sep 07 '21 at 13:34

score 4 · Answer 2 · edited Jul 22 '19 at 16:34

General purpose means that these registers can be used as operands with "general purpose instructions", such as mov or add.

However all of these registers have at least one special function (list is incomplete):

ax always provides the input to and receives the result of mul / div operations
ax as the default accumulator register has some shorter encodings of various instructions
bx is one of the four registers (bx, bp, di, si), that can be used for indirect memory addressing in 16-bit addressing modes.
cx is used as counter by several instructions, for example shift counts, loop, and rep
dx contains the high order bits of the result in 16-bit to 32-bit multiplications, and the same of the input in 32-bit to 16-bit divisions
sp is affected and used by the push and pop instructions, as well as various call and ret type control transfer instructions. Also used asynchronously by hardware interrupts.
bp is affected by the enter and leave instructions. (But don't use enter, it's slow).
si and di are used by string instructions such as movsb

`ax` being the default accumulator register is described here: https://stackoverflow.com/questions/38019386/what-is-the-significance-of-operations-on-the-register-eax-having-their-own-opco — ecm, Jul 25 '19 at 10:19

score -1 · Answer 3 · answered Sep 08 '21 at 15:21

On the 8086, only the following addressing modes are available. There are 17 in total. In general, there is more than one way to write the same address. For example [a][b][c] may be a valid representation of [a + b + c].

segment:[a] means that the address [a] is relative to a segment address segment. (See below link for further details.)

# Displacement
[foo]

# Register, Indirect
[bx] = ds:[bx]
[bp] = ss:[bp]
[si] = ds:[si]
[di] = ds:[di]

# Indexed Addressing
foo[bx] = [bx + foo] = ds:[bx + foo]
foo[bp] = [bp + foo] = ss:[bp + foo]
foo[si] = [si + foo] = ds:[si + foo]
foo[di] = [di + foo] = ds:[di + foo]
# where ds:[] indicates the base address, given by the 16
# bit base offset register `ds` (or `ss`)
# The 8086 uses a 20 bit addressing mode of which the high
# 16 bits are set by the segment offset and the low 16 bits
# are set by the bx, bp, si and di registers. The calculated
# address is non-unique, as 12 of the 16 bits from each register
# overlap. See the Intel programmers manual for more details

# Based Indexed Addressing
[bx + si] = ds:[bx + si]
[bx + di] = ds:[bx + di]
[bp + si] = ss:[bp + si]
[bp + di] = ss:[bp + si]
# the data segment is used for addressing modes intended for use with
# data (the first two in this list)
# the stack segment is used for addressing modes intended for use with
# the stack (the last two in this list)

# Displacement + Based Indexed
foo[bx + si] = ds:[bx + si + foo]
foo[bx + di] = ds:[bx + di + foo]
foo[bp + si] = ss:[bp + si + foo]
foo[bp + di] = ss:[bp + di + foo]
# These are the same as above with an additional offset `foo`

(See: 8086 Addressing Modes)

foo is some arbitrary value. Note that no addressing mode involving cx exists, so [cx] is not a valid memory operand.

`[bp] = ss:[bp]` doesn't actually exist in machine code; it's something the assembler has to emulate as `[bp+0]`. Also, the parts about `foo[bx] = [bx + foo]` and `[a][b] = [a+b]` are specific to MASM/TASM syntax, not NASM. In NASM, having stuff outside the `[]` would be a syntax error. — Peter Cordes, Sep 08 '21 at 21:17
Also, `[foo + bx]` isn't "Indexed" in standard x86 terminology. (Although you could call it that in general computer-science terminology if `foo` is an array address instead of a small constant like 4 or something). BX is a base register in `[foo + bx]`. — Peter Cordes, Sep 08 '21 at 21:22
Technically `[si]` is an index register, but for 32/64-bit addressing modes we only call it an index when it uses the SIB byte, which 16-bit addr modes don't have. Really what matters for the CPU is having 2 registers, e.g. for unlamination on Sandybridge ([Do terms like direct/indirect addressing mode actual exists in the Intel x86 manuals](https://stackoverflow.com/q/46257018) - really x86 just allows subsets of its general case addressing, like base + displacement.) — Peter Cordes, Sep 08 '21 at 21:23

Differences between general purpose registers in 8086: [bx] works, [cx] doesn't?

3 Answers3

Linked

Related