0

If the offset operator is supposed to return the distance of a variable from the beginning of its enclosing segment, then why is a returned offset always some huge number? In following example the variable num is first in .data segment. Doesn't it have to be at 0 offset? On my machine I get 00007FF7C90A4000 offset. I don't understand.

.data 
num byte 123
.code   
mov rsi, offset num
Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
Oleksa
  • 635
  • 5
  • 15

2 Answers2

3

This looks like 64-bit code. 64-bit code usually runs under a 64-bit protected mode OS, such as Windows or Linux. Those systems load programs at arbitrary starting addresses, not necessarily at logical 0. So the offset that you get is the offset of the variable in the data section, plus the starting address of the data section in the process memory. The former might be zero, but the latter can be anything.

Note that the offset that your program sees is the virtual address, not a physical one. It does correspond to some real memory, but the underlying physical address is only known to the OS kernel.

In fact, operating systems make a point of loading your file to a random, unpredictable starting address to thwart a certain class of hacking attacks (exploits).

Seva Alekseyev
  • 59,826
  • 25
  • 160
  • 281
  • "plus the starting address of the data section in the process memory" this slightly contradicts with definitions of the offset operator that I've read in some books. They're all saying about the distance from the beginning of data segment. Nothing about + address of the segment itself. – Oleksa May 18 '20 at 20:40
  • 1
    See https://stackoverflow.com/questions/39482404/offset-operator-in-assembly-language-for-x86-processors?rq=1 – Seva Alekseyev May 18 '20 at 20:41
  • @OleksiyPlotnyts'kyy: `offset`'s name comes from the "offset" part of a `seg:off` address, not offset within a section. In x86-64, everything is relative to a segment base of 0. (Unless you use FS or GS segments; they still "work" like they did in 32-bit mode, so OSes can use them for thread-local storage where the same offset gives a different linear virtual address in different threads.) – Peter Cordes May 18 '20 at 20:49
  • For the record, `.data` in your example is a **section**, not a **segment**. Not the same thing. You can't use segments in 64-bit code. – Seva Alekseyev May 18 '20 at 20:51
2

The OFFSET operator does return the offset from the start of the segment, but 64-bit (and 32-bit) code uses a flat memory model where there's only one segment and the segment covers the entire linear address space starting at address 0. The .data and .code directives don't actually create two different segments, they create two sections in the one single flat segment.

The reason why that the offset in your particular case is so high is that Windows will load program at a random linear address if it can. This is done as a security measure to make it harder for various buffer overflow attacks to work. Since the single flat segment has a base of 0, and Windows apparently decided to load your program at address at 0x00007FF7C90A0000 or there abouts, the offset at the start of the .data section ends up being very far from the start of the segment at address 0.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112