1

First of all, sorry for the basic questions -- I'm new to assembly language/MASM. I have been very confused with the use of OFFSET, square brackets and de-referencing.

This is my understanding:

  1. Variables/data labels are memory addresses. Square brackets imply a de-reference, so [var] would retrieve the content at address var. MASM instructions automatically dereference memory operands, so the following would both copy the content of var to eax:

    MOV eax, var
    MOV eax, [var]
    
  2. To move the address of var to a register, one would need to do

    MOV reg, OFFSET var
    

    But now it looks like var and [var] are not equivalent anymore:

    var  DWORD  10h
    mov esi, OFFSET var
    mov eax, [esi]     ; eax = 10h
    mov eax, esi       ; eax = address of var
    

    This is where my confusion starts. Given dereferencing is always implied, when are square brackets necessary? When are they optional?

  3. In addition, the following would initialize var2 with the address of var1

    var1 byte 10h,20h,30h,40h
    var2 dword var1
    var2 dword OFFSET var1   ; equivalent
    

Now, when does var1 refer to the address? When does it refer to the content?

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
L. Vu
  • 19
  • 3
  • One good answer by Ross Ridge is in his [Stackoverflow answer](http://stackoverflow.com/a/25130189/3857942) . It might be a good starting point. it doesn't answer everything you ask, but it should get you going. – Michael Petch Apr 28 '16 at 04:56
  • 1
    In instructions brackets are only necessary when using registers, otherwise they're ignored and the type of the symbol determines whether its a memory operand or an immediate operand. In a data definition statement like `var2 DWORD var1` the operand has to be a constant not a memory reference so it's always interpreted as a constant, with or with brackets or the OFFSET operator. – Ross Ridge Apr 28 '16 at 05:20
  • 5
    In instructions, you can remove any ambiguity by always using either `[symbol]` or `offset symbol`, never just `symbol`. Getting used to that style will make it easier to read/write NASM syntax, too. (Although if you have to read other people's MASM code, you will need to learn to watch out for bare symbols). – Peter Cordes Apr 28 '16 at 05:45
  • 2
    I agree to @PeterCordes, always use square brackets in your own code, it's less ambiguous, and the reason why NASM settled for this syntax. However, just using the symbol in NASM will give you the position dependent address (and NASM will refuse to generic position independent code, PIC for short). – Leandros Apr 28 '16 at 07:23
  • 1
    don't use this "MASM automatically dereferences" feature, it's causes more trouble than it helps. Not every assembler does it, so your code isn't portable, and you get used to a (imo) bad habit. +1 to PeterCordes's comment – Tommylee2k Apr 28 '16 at 08:36
  • 1
    @Leandros: Using `mov r32, imm32` will always be position-dependent. Unless you're claiming that MASM has a mode that rewrites `mov r64, offset symbol` to `lea r64, [rel symbol]`, IDK what your point is about NASM. I wish `offset` was a keyword in NASM, so you could always write `offset symbol`, too, without having to define `offset` as a macro that expands to the empty string for NASM/YASM. It would be nice esp. for guiding newbies. – Peter Cordes Apr 28 '16 at 12:40
  • @PeterCordes I always thought `offset symbol` is internally rewritten to `[rel symbol]`. But I haven't done much in MASM recently, and might be totally wrong or mix it up with NASM. – Leandros Apr 28 '16 at 12:48
  • 2
    @Leandros: That's impossible. It's not a question of syntax, it's a question of what's encodable in machine code. Immediate operands are always absolute. There's no room for a bit that marks an imm32 as RIP-relative, although that would be an interesting idea if you were redesigning x86-64 instruction encoding from scratch. **`lea` is the only PIC way to get an address into a register**. There's no PIC way to `add rsi, offset symbol` in one instruction, for example. (`[RIP + disp32 + RSI]` is not encodeable, only RIP+disp32). – Peter Cordes Apr 28 '16 at 12:57
  • @PeterCordes Should've looked at the instruction. Yes, I was quite off. Thanks for the explanation, though. – Leandros Apr 28 '16 at 13:04
  • 1
    Thanks everyone. That makes sense. So to sum up: 1) Memory operands are always automatically dereferenced. To avoid ambiguity though, one should always use []. 2) [] are only meaningful for registers: reg gives the register's content, and [reg] retrieves the content at the address held by reg. 3) @PeterCordes: Could you elaborate a bit on this point: "var2 dword var1 assembles to the address of var1. This is the only sane behaviour, because var1 could be extern, making its contents unavailable at assemble time." – L. Vu Apr 28 '16 at 16:38
  • @L.Vu: Imagine you have `extern my_c_global` / `var2 dword my_c_global`. The assembler doesn't have access to the value of `my_c_global`, because it's in a `.o` from compiler output. It is possible to make reference to the address of external symbols, though: the linker can resolve this. – Peter Cordes Apr 28 '16 at 16:47

0 Answers0