Memory addressing with GNU Assember Intel Syntax

Question

I read this page containing a good list of differences between Intel and AT&T syntax for GAS but it did not cover the case of specifying an address with a displacement only.

Here I've assembled four lines with AT&T syntax:

                         .text
0000 48C7C008000000      mov    $8, %rax
0007 488B042508000000    mov    (8), %rax
000f 4889F0              mov    %rsi, %rax
0012 488B06              mov    (%rsi), %rax

The first two lines are different, as expected. The first moves the immediate value 8 into rax and the second moves the contents of address 8 into rax. But with Intel syntax I get the following odd behavior:

                         .text
                         .intel_syntax
0000 48C7C008000000      mov    %rax, 8
0007 48C7C008000000      mov    %rax, [8]
000e 4889F0              mov    %rax, %rsi
0011 488B06              mov    %rax, [%rsi]

Here the first and second lines assembled to the same machine code! First I thought the square brackets were wrong, so I added the third and fourth lines to the test, and the square brackets do work for memory addressing at least when registers are involved.

All of the documentation I have read show memory addressing examples with at least a base or index register, and sometimes a scale and displacement, but never a displacement only.

I do have experience with the Intel syntax using the NASM assembler, which does distinguish mov rax, 8 and mov rax, [8].

Is this a bug in GAS? Or if not, how do I specify the equivalent of NASM's mov rax, [8]?

I realize it is probably uncommon to specify a displacement-only address but I would like to get a complete understanding of all of the memory addressing forms with this syntax.

When I assemble your example with Intel syntax, I *do* get the expected behavior. For me, the generated machine code is exactly the same as what you show except the second line, which is `48 8b 04 25 08 00 00` for me. — mtvec, Apr 19 '12 at 07:50
If in the above _Intel Syntax_ example to GAS you got the shown output then yes, your version GAS created incorrect code for `mov %rax, 8` (which initializes `rax` with the _constant_ 8, same as AT&T `mov $8, %rax`). All the addressing operations, as present, use the correct opcodes. — FrankH., Apr 19 '12 at 11:16

score 7 · Accepted Answer · answered Apr 19 '12 at 23:11

7

There was indeed such a bug in gas -- see http://sourceware.org/bugzilla/show_bug.cgi?id=10637 .

It appears to be fixed in (or perhaps before) binutils 2.21.51.

answered Apr 19 '12 at 23:11

Matthew Slattery

45,290
8
103
119

Wow, I did not find that on a search. Indeed `gcc -v` gives "GNU assembler version 2.20.1 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.20.1-system.20100303" so I will update to 2.21.x and give it a shot. – Ray Toal Apr 20 '12 at 00:12
Verified on the new binutils. Thanks @Matthew. – Ray Toal Apr 20 '12 at 04:38

score 2 · Answer 2 · answered Apr 19 '12 at 10:33

You're seeing a very special corner case of AT&T syntax here. Ordinarily, for address operands, you have:

<op> [ src, ] displacement(base,index,scale) [, tgt ]

Any of the constituents of an address operand in AT&T syntax are optional, so you can write mov (%rax, %rbx), ... or mov 0(%rax, %rbx, 1), ... or any other such combination.

Inside the () brackets the only number you ordinarily can have is the scale factor (if present).

But the assembler also accepts (and creates identical code for):

mov <absolute>, ...
mov (<absolute>), ...

This works only if the operand inside the () is a simple number / absolute address, otherwise the assembler complains that what you gave isn't a valid scale factor. This equivalence is a special case in AT&T syntax - I'm not sure why it was/is allowed.

The use of $ in AT&T syntax, though, always specifies a constant, not an address operand, which is the same as a naked number in Intel Syntax.

The following illustrates the equivalences:

$ cat t.s
        mov     (8), %rax
        mov     $8, %rax
        mov     8, %rax
.intel_syntax
        mov %rax, [ 8 ]
        mov %rax, 8
        mov %rax, %ds:8

$ objdump -w -d t.o

t.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:
   0:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
   8:   48 c7 c0 08 00 00 00            mov    $0x8,%rax
   f:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
  17:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
  1f:   48 c7 c0 08 00 00 00            mov    $0x8,%rax
  26:   48 8b 04 25 08 00 00 00         mov    0x8,%rax

$ objdump -w -M intel -d t.o

t.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 :
   0:   48 8b 04 25 08 00 00 00         mov    rax,ds:0x8
   8:   48 c7 c0 08 00 00 00            mov    rax,0x8
   f:   48 8b 04 25 08 00 00 00         mov    rax,ds:0x8
  17:   48 8b 04 25 08 00 00 00         mov    0x8,%rax
  1f:   48 c7 c0 08 00 00 00            mov    rax,0x8
  26:   48 8b 04 25 08 00 00 00         mov    rax,ds:0x8

The question is about the odd behaviour of the Intel syntax; but... I don't think the AT&T syntax here is a special case -- I suspect that it's just evaluating `(8)` as an expression, because it doesn't match anything that *is* special syntax. e.g. `mov 8, %rax`, `mov (8), %rax`, `mov 4+4, %rax` and `mov (4+4), %rax` all do the same thing. — Matthew Slattery, Apr 19 '12 at 23:14

Memory addressing with GNU Assember Intel Syntax

2 Answers2