1

I'm learning assembly using GNU Assembler using the Intel syntax and there is one think that I don't really understand. Say for example this code right here :

.intel_syntax noprefix
.data
string: .asciz "hello world"

.text
.global entry
.type entry, @function

entry:
    mov byte ptr[string + 4], 'a'
    mov eax, offset flat:string
    ret

I get the idea to use offset flat: as we are writing relocatable code. But why don't we also specify offset flat:string at his line : mov byte ptr[string + 4], 'a' as we are doing over here mov eax, offset flat:string ?

I'm really confused. If doing mov byte ptr[string + 4], 'a' works to get the address of the string label + 4 then why doing mov eax, string isn't the same ?

Edit :

To clarify, After calling entry, I use printf to print what's in EAX as follow :

#include <stdio.h>

extern char *entry(void);

int main(int argc, char*argv[])
{
    printf("%s", entry());

}
Liwinux
  • 87
  • 6
  • 1
    Thank you for those links, so if I understood correctly, GAS knows that I want to use the address and not the value therefore there is no need to write offset flat ? Or is it because [symbol] is always an effective-address, never an immediate, in GAS ? – Liwinux Mar 19 '22 at 18:24

1 Answers1

1

You always need OFFSET when you want a symbol address as an immediate, like AT&T syntax $string instead of string. You never need it any other time.


Basically it comes down to the fact that in GAS Intel syntax (like AT&T movb $'a', string+4), string is a memory operand even without [], so it needs extra syntax to ask for the address instead of memory at that address.

When using string as part of [string + 4], you're not asking for the offset, you're addressing memory at that label/symbol address. Using it as part of an addressing mode.

If you'd rather use a better-designed syntax where mov eax, string+4 does give you the address (without dereferencing it), use NASM.

Otherwise see Confusing brackets in MASM32 (GAS's Intel syntax is like MASM in most ways, except that mov eax, [12] is a load from that absolute address, not MASM's insanity of having that be equivalent to mov eax, 12),

And somewhat related: Distinguishing memory from constant in GNU as .intel_syntax about how GAS parses constants, but that's more about .equ foo, 4 / foo = 4 appearing before vs. after the instruction referencing it, if you use mov eax, foo instead of something unambiguous like mov eax, [foo] or mov eax, OFFSET foo

Also:

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I'm not sure if I understood correctly but I get the idea that `string` is a memory operand so it's like asking for the value at that symbol. However I don't get the part where you say that `[string + 4]` is like addressing memory at that symbol. If `string` is a memory operand that doing `[string + 4]` would just take the ascii char + 4 ? I'm sorry I'm still confused about offset flat... – Liwinux Mar 19 '22 at 19:54
  • I saw the following [here](https://stackoverflow.com/questions/39355188/distinguishing-memory-from-constant-in-gnu-as-intel-syntax/39360636#39360636) : `[symbol] is always an effective-address, never an immediate, in GAS and NASM/YASM.` That's maybe why we don't need to write offset flat ? – Liwinux Mar 19 '22 at 20:01
  • @Liwinux: asm syntax exists to describe machine code. x86 doesn't have double-indirect addressing, so there's nothing like `[ [string] + 4]` because the machine can't do that in one instruction. `string+4` and `[string+4]` are the same thing in GAS/MASM style Intel syntax because that's just a design choice that `string` *without* OFFSET still implies dereferencing it. – Peter Cordes Mar 19 '22 at 20:02
  • Now I get it, I was so confused about offset flat in the first place. I think I'm not so advanced in GAS to fully understand your answer above. – Liwinux Mar 19 '22 at 20:22
  • @Liwinux: It's kind of hard to explain the way your question seems to be asking, because it's mostly just arbitrary choices of syntax design. The fact that the possibilities in machine code are quite limited means that we can have different ways of writing the same thing, although IMO that's poor design. NASM doesn't have this problem, memory operands always use `[]`, immediates never do, and a bare symbol name is always the address. So putting it inside `[]` is like dereferencing that address, a lot like how C works with a `char arr[]` where `arr` is the address, `*arr` is the mem there. – Peter Cordes Mar 19 '22 at 20:36
  • That's exactly how I saw things, putting something inside `[]` would do the same like C does, That's where the confusion was initially, The fact that just writting `string` is still a memory operand even without the `[]` was just too much for my brain to process I guess. Anyway, I think I now have a better understanding about the "weird" Intel syntax and I can"t thank you enough for that ! I'm still gonna stick with GAS, as I always want to know how things really work and most importantly why they work this way. – Liwinux Mar 19 '22 at 20:59
  • @Liwinux: GAS isn't more low-level than NASM, they're both good tools for generating x86 machine code (which is how things *really* work) from a human-readable text version. NASM generally has more helpful error messages. Neither of them have any "magic" that does anything for you, unlike MASM where declaring data can magically imply an operand-size when you use it. But sure, if you want to stick with GAS, that's fine, it's not a bad assembler. (As long as you have a recent enough version that it warns about ambiguous operand-size on `add [rdi], 123` instead of just silently picking dword) – Peter Cordes Mar 19 '22 at 21:05