35

So, as the question states, what is the purpose of CS and IP registers in intel's 8086

I found this explanation:

Code segment (CS) is a 16-bit register containing address of 64 KB segment with processor instructions. The processor uses CS segment for all accesses to instructions referenced by instruction pointer (IP) register. CS register cannot be changed directly. The CS register is automatically updated during far jump, far call and far return instructions.

and this for IP:

Instruction Pointer (IP) is a 16-bit register.

I don't really understand what this basically means, so if someone could provide a more "vivid" explanation, that would be great :)

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
idjuradj
  • 1,355
  • 6
  • 19
  • 31

6 Answers6

39

The physical address is calculated from 2 parts. i) segment address. ii) offset address. The CS(code segment register) is used to address the code segment of the memory i.e a location in the memory where the code is stored. The IP(Instruction pointer) contains the offset within the code segment of the memory. Hence CS:IP is used to point to the location (i.e to calculate the physical address)of the code in the memory.

kiran james
  • 391
  • 3
  • 3
28

Since the Instruction Pointer (IP) is 16 bit it means you can only have 64k instructions (2^16), which wasn't much even in the 80s. So to expand the address space you have a second register which addresses 64k blocks. You could consider cs:ip together as one 32 bit register which is then capable of addressing 2^32 bytes...ie 4G which is what you get on a processor which uses 32 bit addresses. The 8086 was using 20 bits of addresses, so you could access 1M of memory.

Amin Shojaei
  • 5,451
  • 2
  • 38
  • 46
user1666959
  • 1,805
  • 12
  • 11
  • And where is CS used? I read a little bit about segment and offset, and i can say i have understanding of the segment/offset mechanism. – idjuradj Jul 25 '13 at 11:42
  • I expanded my question: And where is CS used? I read a little bit about segment and offset, and i can say i have understanding of the segment/offset mechanism. But, where is Code Segment register used? As far as i know, there's data segment, stack segment, extra segment and mentioned code segment? And since CS is "paired" with IP register, and uses it's 4 bits for offset, are other registers also paired with IP registers or each of these 4 segment registers has it's own offset register? – idjuradj Jul 25 '13 at 12:02
  • Every time a new instruction is fetched by the processor (from IP) cs is used implicitly. CS points to the code segment of your program, and the physical address where the next instruction resides is assembled transparently. And similarly, every time you access a piece of data (mov ax, [1234] -- 1234 is implicitly prefixed by ds) which resides in your ds. You can't do much with CS, but when you do a long jump it is used. – user1666959 Jul 26 '13 at 03:33
  • 10
    _cs:ip together as one 32 bit register which is then capable of addressing 2^32 bytes_ . This is wrong. CS:IP together even on a 32-bit processor in real mode is still only capable of addressing using 20 bits.(Tecnically speaking on a 286 or 386+ CS:IP is capable of addressing 0 to 0x10FFEF given that 0xFFFF:0xFFFF=(0xFFFF<<4)+0xFFFF = 10FFEF. To address 4gb of memory on a 386 the IP register was expanded to the 32-bit register EIP which could address 4gb. – Michael Petch Apr 09 '18 at 22:18
  • Agree with Michael Petch's comment. 16-bit CS:IP can address at most 0x10FFEF, based on their definition. The starting address CS addresses is fixed, which is its value multiplied by 0x10. – robbie fan Sep 06 '19 at 07:46
20

The instruction that will be executed next is that at memory address equal to:

16 * CS + IP

This allows 20 bits of memory to be addressed, despite registers being only 16 bits wide (and it also creates two distinct ways to encode most of the addresses).

The effect of CS is analogous to that of the other segment registers. E.g., DS increments data accesses (that don't specify another segment register) by 16 * DS.

CS

The instructions that modify CS are:

  • ljmp (far jump)
  • lcall (far call), which pushes ip and cs to the stack, and then far jumps
  • lref (far return), which inverses the far call
  • int, which reads IP / CS from the Interrupt Vector Table
  • iret, which reverse an int

CS cannot me modified by mov like the other segment registers. Trying to encode it with the standard identifier for CS, which GNU GAS 2.24 does without complaining if you write:

mov %ax, %cs

leads to an invalid code exception when executed.

To observe the effect of CS, try adding the following to a boot sector and running it in QEMU as explained here https://stackoverflow.com/a/32483545/895245

/* $1 is the new CS, $1f the new IP. */
ljmp $1, $after1
after1:
/* Skip 16 bytes to make up for the CS == 1. */
.skip 0x10
mov %cs, %ax
/* cs == 1 */

ljmp $2, $after2
after2:
.skip 0x20
mov %cs, %ax
/* cs == 2 */

IP

IP increases automatically whenever an instruction is executed by the length of the encoding of that instruction: this is why the program moves forward!

IP is modified by the same instructions that modify CS, and by the non-far versions of those instructions as well (more common case).

IP cannot be observed directly, so it is harder to play with it. Check this question for alternatives: Reading Program Counter directly

Community
  • 1
  • 1
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • In the example you provided, can `$1` and `$2` be arbitrary (valid) values? Since `$after1` and `$after2` are relative values to current IP, don't `$1` and `$2` have to be 0 for the jump to jump correctly (if the segment registers are not 0, then `16*CS+IP` won't match the label, since `$after` already accounted for the difference)? – tartaruga_casco_mole Nov 15 '20 at 12:56
  • @tartaruga_casco_mole (nice nick) I think `$after` is not relative but absolute, e.g. `EA cd` encoding from https://c9x.me/x86/html/file_module_x86_id_147.html and GNU Gas decides the relocation type correctly based on the exact instruction encoding to be used. I suggest confirming this from disassembly. – Ciro Santilli OurBigBook.com Nov 15 '20 at 13:44
4

since the 8086 processor uses 20 bits addressing, we can access 1MB of memory, but registers of 8086 is only 16 bits,so to access the data from the memory we are combining the values present in code segment registers and instruction pointer registers to generate a physical address, it is done by moving the value of CS 4 bits towards left and then adding it with the value IP

EXAMPLE:

value of CS is 1234Hex(hexa decimal)

value of IP is 5678Hex

now value of CS after moving 4 bits left is 12340Hex then after adding with IP value it is 179B8Hex which is the physical address

Amrish Ak
  • 41
  • 1
1

Once you write .code in your assembly program text, that .code points to the cs value. any command later or earlier in the file will be addressed as per cs:ip , where ip is an offset value of from cs.

Of course, you have to bear in mind that assembly compiler will convert the text into machine code instructions first.

zeimer
  • 11
  • 1
0

IP register - IP is Instruction Pointer. Its function is the same as PC (program counter) in other microprocessor which is to point to the next instruction to be fetched by BIU unit to be feed into EU unit.