3

I mean something that I write in NASM like this:

mov dword [0xA0BF17C], ' : )'

I have tried such a things in GNU assembler:

movd " : )", 0xB8000

movd $" : )", 0xB8000

movd ' : )', 0xB8000

movd " : )", $0xB8000

But... They all caused this error:

Error: unbalanced parenthesis in operand 1.
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
yomol777
  • 463
  • 4
  • 16
  • 1
    I don't think this is possible like this in the GNU assembler. Have you checked the manual? – fuz Jun 13 '20 at 10:24
  • It wasn't written how to do it there, but that doesn't mean you can't do it. – yomol777 Jun 13 '20 at 12:19
  • So basically, the GNU assembler does not support using string literals as integer constants. One solution is to manually look up the ASCII codes of these characters and form an integer literal. – fuz Jun 13 '20 at 12:40

1 Answers1

5

GAS only supports single-character literals as numbers. UTF-8 multi-byte single character is ok, but not multiple separate characters. You could do movb $' ', 0xB8000, but you don't want to use 4 instructions for 4 bytes.

You have two real options: shift together single-character literals into a number, or write it out in hex. (Both ways taking into account that x86 is little-endian)

# NASM   mov eax, "abcd"
movl  $'a' + ('b'<<8) + ('c'<<16) + ('d'<<24),  0xB8000
movl  $0x64636261,  0xB8000         # or manual ASCII -> hex, little-endian

The shift/add trick works with any arbitrary bytes; you could maybe even make a #define CPP macro to do it (taking 4 args).

With an EAX destination instead of memory (to simplify the machine code), disassembled back into GAS Intel syntax (objdump -drwC -Mintel), we can see they both assembled identically (with as --32):

   0:   b8 61 62 63 64          mov    eax,0x64636261
   5:   b8 61 62 63 64          mov    eax,0x64636261

Or with your memory destination. Again, 32-bit mode since this would #GP fault in real mode from exceeding the 64k DS segment limit with that 0xb8000 offset.
Also notice that the immediate bytes in the machine code are in the same order that will be stored as data to the memory destination. (And they match source order if you were using NASM mov dst, "abcd".

a:   c7 05 00 80 0b 00 61 62 63 64   mov    DWORD PTR ds:0xb8000,0x64636261

Unlike NASM, GAS doesn't support multi-character character literals as numeric constants. It so doesn't support them that they even confuse GAS's parser1! GAS was mostly designed for assembling compiler output, and compilers don't need this.

GAS only supports (double) quoted strings of multiple characters as args to .ascii / .asciz / .string8/16/32, not to .byte (unlike NASM db) or as an immediate operand for an instruction.

If it was supported, the x86 AT&T syntax would be movl $' : )', 0xB8000.
Not movd, and an immediate operand always needs a $.

See When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order? for NASM vs. MASM vs. GAS with multi-character literals. Only NASM works intuitively.


Double quotes don't work either: mov $"foo", %eax assembles, but it assembles the same as mov $foo, %eax - putting the address of the symbol foo into a register. See relocation R_X86_64_8 against undefined symbol `ELF' can not be used when making a PIE object for an example of that.


Footnote 1: Hence errors like "unbalanced parenthesis" instead of something sensible like "character literal contains multiple characters".

mov $'abcd', %eax

is another example of totally confusing the parser. It sees the b as a backward symbol reference for local labels, like jmp 1b to reference a 1: label in the backwards direction. But the label number it's looking for here is 97, the ASCII value of 'a'. This is totally bonkers

foo.s: Assembler messages:
foo.s:4: Error: backward ref to unknown label "97:"
foo.s:4: Error: junk `cd44%eax' after expression
foo.s:4: Error: number of operands mismatch for `mov'

All of this was tested with as --version = GNU assembler (GNU Binutils) 2.34.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Related: [How are dw and dd different from db directives for strings?](https://stackoverflow.com/q/38860174) for NASM, [When using the MOV mnemonic to load/copy a string to a memory register in MASM, are the characters stored in reverse order?](https://stackoverflow.com/a/57436181) for MASM – Peter Cordes Dec 04 '22 at 18:11