14

I am reading about memory addressing. I read about segment offset and then about descriptor offset. I know how to calculate the exact addresses in real mode. All this is OK, but I am unable to understand what exactly offset is? Everywhere I read:

In real mode, the registers are only 16 bits, so you can only address up to 64k. In order to allow addressing of more memory, addresses are calculated from segment * 16 + offset.

Here I can understand the first line. We have 16 bits, so we can address up to 2^16 = 64k.

But what is this second line? What the segment represent? Why we multiply it with 16? why we add offset. I just can't understand what this offset is? Can anybody explain me or give me link for this please?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
narayanpatra
  • 5,627
  • 13
  • 51
  • 60
  • A similar Q&A: [What are Segments and how can they be addressed in 8086 mode?](https://stackoverflow.com/questions/42861524/what-are-segments-and-how-can-they-be-addressed-in-8086-mode) – Peter Cordes Mar 23 '18 at 08:33
  • I don't think real mode has *descriptor offset*s. It doesn't even have descriptors; only protected mode has the GDT/LDT (Global/Local Descriptor Table), and the IDT (Interrupt Descriptor Table) instead of real mode's IVT (Interrupt Vector Table). The only context where google finds "descriptor offset" is with Unix file descriptors, the position you can set with `lseek`. Totally unrelated to x86 seg:off addressing. – Peter Cordes Nov 06 '20 at 02:13

6 Answers6

21

When Intel was building the 8086, there was a valid case for having more than 64KB in a machine, but there was no way it'd ever use a 32-bit address space. Back then, even a megabyte was a whole lot of memory. (Remember the infamous quote "640K ought to be enough for anybody"? It's essentially a mistranslation of the fact that back then, 1MB was freaking huge.) The word "gigabyte" wouldn't be in common use for another 15-20 years, and it wouldn't be referring to RAM for like another 5-10 years after that.

So instead of implementing an address space so huge that it'd "never" be fully utilized, what they did was implement 20-bit addresses. They still used 16-bit words for addresses, because after all, this is a 16-bit processor. The upper word was the "segment" and the lower word was the "offset". The two parts overlapped considerably, though -- a "segment" is a 64KB chunk of memory that starts at (segment) * 16, and the "offset" can point anywhere within that chunk. In order to calculate the actual address, you multiply the segment part of the address by 16 (or shift it left by 4 bits...same thing), and then add the offset. When you're done, you have a 20-bit address.

 19           4  0
  +--+--+--+--+
  |  segment  |
  +--+--+--+--+--+
     |   offset  |
     +--+--+--+--+

For example, if the segment were 0x8000, and the offset were 0x0100, the actual address comes out to ((0x8000 << 4) + 0x0100) == 0x80100.

   8  0  0  0
      0  1  0  0
  ---------------
   8  0  1  0  0

The math is rarely that neat, though -- 0x80100 can be represented by literally thousands of different segment:offset combinations (4096, if my math is right).

cHao
  • 84,970
  • 20
  • 145
  • 172
  • 1
    I know all these math. but why multiply by 16? why + offset? – narayanpatra Nov 07 '10 at 20:42
  • 4
    Because that's how Intel decided to turn two 16-bit numbers into a 20-bit number. "Offset" is the most useful part of the address (as the "segment" is too coarse-grained to actually point at stuff with), so you *have* to add it *somewhere*... – cHao Nov 07 '10 at 21:23
  • 1
    @cHao: Actually, 16 bytes would be a good granularity for the start of objects. If programming languages had included an "object pointer" type which could only point to the start of things received from a memory-allocation request, a lot of 4-byte pointers could have been replaced with 2-byte pointers, something that would not have been possible using 32-bit linear addressing. Unfortunately, such a usage pattern really only works in assembly language. – supercat Feb 21 '14 at 00:28
  • 2
    It's rather a pain even then; since the 8086 only gives you two segment registers to really play with, you'd end up spending a not-insignificant amount of code just swapping segment registers into position for each operation. Same reason you don't want to use the "huge" model for everything -- it can easily be more trouble than it's worth. – cHao Feb 24 '14 at 02:08
  • 7
    @cHoa: The reason for the segment offset architecture is actually a pragmatic one. The 8086 was designed to augment and replace the 8008, 8080, and 8085 processors. These were all 8 bit processors. The segment offset addressing allowed assembly language written for these older 8 bit processors to run on the new 16 bit processor without modification. See: http://en.wikipedia.org/wiki/Intel_8086 – WayneJ Jul 25 '14 at 16:26
13

Under x86 Real-Mode Memory the physical address is 20 bit long and is therefore calculated as:

PhysicalAddress = Segment * 16 + Offset

Check also: Real-Mode Memory Management

destructioneer
  • 150
  • 1
  • 10
GJ.
  • 10,810
  • 2
  • 45
  • 62
1

I want to add an answer here just because I've been scouring the internet trying to understand this too. The other answers were leaving out a key piece of information that I did get from the link presented in one of the answers. However, I almost totally missed it. Reading through the linked page, I still wasn't understanding how this was working.

The problem I was probably having was from myself only really understanding how the Commodore 64 (6502 processor) laid out memory. It uses similar notation to address memory. It has 64k of total memory, and uses 8-bit values of PAGE:OFFSET to access memory. Each page is 256 bytes long (8-bit number) and the offset points to one of values in that page. Pages are spaced back-to-back in memory. So page 2 starts where page 1 ends. I was going into the 386 thinking the same style. This is not so.

Real mode is using a similar style even if it is different wording SEGMENT:OFFSET. A segment is 64k in size. However, the segments themselves are not laid out back-to-back like the Commodore was. They are spaced 16 bytes apart from each other. Offset still operates the same, indicating how many bytes from the page\segment start.

I hope this explanation helps anyone else who finds this question, it has helped me in writing it.

Thraka
  • 2,065
  • 19
  • 24
1

I can see the question and answers are some years old, but there is a wrong statement that there are only 16 bit registers exist within the real mode.

Within the real mode the registers are not only 16 bit, because there are also 8 bit registers too. Every of these 8 bit register is a part of a 16 bit register which are divided into a lower and a higher part of a 16 bit register.

And starting the real mode with a 80386+ we become 32 bit registers and additional also two new instruction prefixes, one for to override/reverse the default operand-size and one for to override/reverse the default address-size of one instruction inside of a codesegment.

These instruction prefixes can be used in combination for to reverse the operand-size and the address-size together for one instruction. Within the real mode the default operand-size and address-size is 16 bit. With these both instruction prefixes we can use a 32 bit operand/register example for to calculate a 32 bit value in one 32 bit register, or for to move a 32 bit value to and from a memmory location. And we can use all 32 bit registers(maybe in combination with a base+index*scale+displacement) as an address-register, but the sum of the effective address do not have to be exceed the limit of the 64 kb segment-size.

(On the OSDEV-Wiki page we can find in the table for the "Operand-size and address-size override prefix" that the "0x66 operand prefix" and the "0x67 address prefix" is N/A(not aviable) for the real mode and the virtual 8086 mode. http://wiki.osdev.org/X86-64_Instruction_Encoding
But this is totaly wrong, because in the Intel manual we can find this statement: "These prefixes can be used in real-address mode as well as in protected mode and virtual-8086 mode".)

Starting with a Pentium MMX we become eight 64 bit MMX-Registers.
Starting with a Pentium 3 we become eight 128 bit XMM-Registers.
..

If i am not wrong, then the 256 bit YMM-Register and the 512 bit ZMM-Register and the 64 bit general-purpose Register of a x64 can not be used within the real mode.

Dirk

1

Minimal example

With:

  • offset = msg
  • segment = ds
mov $0, %ax
mov %ax, %ds
mov %ds:msg, %al
/* %al contains 1 */

mov $1, %ax
mov %ax, %ds
mov %ds:msg, %al
/* %al contains 2: 1 * 16 bytes forward. */

msg:
.byte 1
.fill 15
.byte 2

So if you want to access memory above 64k:

mov $0xF000, %ax
mov %ax, %ds

Note that this allows for addresses larger than 20 bits wide if you use something like:

0x10 * 0xFFFF + 0xFFFF == 0x10FFEF

On earlier processors which had only 20 address wires, it was simply truncated, but later on things got complicated with the A20 line (21st address wire): https://en.wikipedia.org/wiki/A20_line

On a GitHub repo with the required boilerplate to run it.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • I have downvoted this because it doesn't actually answer the questions posed ' what the segment represent? why we multiply it with 16? why we add offset. I just can't understand what this off set is? Can anybody explain me or give me link for this please?' – Michael Petch Nov 07 '15 at 10:54
  • I have an issue with the accepted answer since the answer only reiterated what the OP stated (the actual equation that wasn't understood), but they did provide a link that does explain the calculation with a reasonable diagram. I would have preferred the accepted answer actually tried to reproduce some of the information at the link that would allow someone to understand the calculation without going off site. – Michael Petch Nov 07 '15 at 11:03
  • 1
    @MichaelPetch no worries. I think the real problem was that the OP did not understand how `ds` works: if the did, the application becomes clear. Other answers already discuss application, so I tried to provide the example to make things precise. – Ciro Santilli OurBigBook.com Nov 07 '15 at 11:05
-1

A 16-bit register can only address up to 0xFFFF (65,536 bytes, 64KB). When that wasn't enough, Intel added segment registers.

Any logical design would have simply combined two 16-bit registers to make a 32-bit address space, (e.g. 0xFFFF : 0xFFFF = 0xFFFFFFFF), but nooooo... Intel had to get all weird on us.

Historically, the frontside bus (FSB) only had 20 address lines, and thus could only transmit 20-bit addresses. To "rectify" this, Intel devised a scheme in which segment registers only extend your address by 4-bits (16bits + 4 = 20, in theory).

To achieve this, the segment register is left-shifted from its original value by 4-bits, then added to the address in your general register (e.g. [es:ax] = ( es << 4 ) + ax). Note: Left shifting 4 bits is equivalent to multiplying by 16.

That's it. Here's some illustrative examples:

;; everything's hexadecimal

[ 0:1 ] = 1

[ F:1 ] = F1

[ F:0 ] = F0

[ F:FF] = 1EF ; [F becomes F0, + FF = 1EF]

[ F000 : FFFF ] = FFFFF (max 20-bit number)

[ FFFF : FFFF ] = 10FFEF (oh shit, 21-bit number!)

So, you can still address more than 20-bits. What happens? The address "wraps around", like modulus arithmetic (as a natural consequence of the hardware). So, 0x10FFEF becomes 0xFFEF.

And there you have it! Intel hired some dumb engineers, and we have to live with it.

James M. Lay
  • 2,270
  • 25
  • 33
  • Hindsight is 20/20. But there are a bunch of good reasons behind Intel's decision. For one, 32-bit addresses wouldn't be useful for another couple of decades. But for another, it made fewer impositions on the software of the day. You only paid the ugly tax if your code actually used segments. – cHao Mar 24 '17 at 18:40