4

I would like to start learning assembler. The first question I ask myself is this : when I sit in front of a computer, how to find out which assembly syntax I have to use?
I read many terms on internet as "ABI", "computer architecture", "processor", "compiler" but ultimately I didn't understand what exactly determines the syntax of the assembly language I have to use.

For instance I have a Mac M1 and I installed a Linux virtual machine. I checked my architecture which is AArch64 so I wrote a very simple assembly program :

.global _start
.section .text

_start:
        mov x8, #0x5d
        mov x0, #0x41

.section .data

For example, this works (compiled with gcc) on my Linux virtual machine but not on the mac directly (also compiled with gcc) because apparently I have to replace .section .data by .data or .section .text by .text. So here I have the same AArch64 architecture and the same compiler, yet the assembly syntax is different... weird.

In short, I would like to know what exact information do I have to look for on a computer (ABI? Architecture? Something else?) in order to know for sure which assembly language syntax to use.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
roi_saumon
  • 489
  • 4
  • 13
  • 1
    Two things: First and foremost the processor itself. This decides the assembly language itself to use. x86 (including x86-64) is very different from ARM, for example. The second thing is the *assembler directives*, which specifies things for the assembler itself but are not part of the processor assembly language. For example the `.global` and `.section` parts in your show code are assembler directives. These depends on the assembler you use. – Some programmer dude Sep 09 '22 at 08:08
  • 1
    The GNU Assembler (GNU as) documentation describes the directives and machine-dependent features https://sourceware.org/binutils/docs/as/ – Sebastian Sep 09 '22 at 08:26
  • @Someprogrammerdude, thank you. I don't understand why if both time we compiled for the same architecture with the same compiler gcc (so I guess the same assembler directives given described by the document linked by @Sebastian) it is not so clear why the directives differ for mac and linux. – roi_saumon Sep 09 '22 at 08:42
  • 1
    Normally on macOS the `gcc` command is an alias for `clang`. I don't know what assembler Clang uses and how it's different from the GNU assembler for the same platform. – Some programmer dude Sep 09 '22 at 08:54
  • @Sebastian, I couldn't find an equivalent of the document you linked to for clang. Do you know if there is an equivalent? – roi_saumon Sep 09 '22 at 09:49
  • @Someprogrammerdude, I am also confused because on wikipedia they say that clang is a compiler front-end so I guess we cannot compile assembly with clang right? – roi_saumon Sep 09 '22 at 10:13
  • 1
    https://stackoverflow.com/questions/69974380/how-to-compile-arm-assembly-on-an-m1-macbook – Hans Passant Sep 09 '22 at 10:14
  • @Someprogrammerdude: Different x86 assemblers use different syntax for the same machine code. For example, AT&T `movl $1234, (%rdi)` is the same instruction as NASM `mov dword [rdi], 1234`. Or GAS `.intel_syntax noprefix` `mov dword ptr [rdi], 1234` (also MASM). So no, the ISA doesn't *uniquely* determine the asm text syntax for instructions. Only the machine code. It does narrow down the asm choices to a handful. Or for quite a few ISAs, down to one, but you mentioned x86 where that's not the case. Even ARM has some syntax variations, like whether `ldreqb` or `ldrbeq` is right. – Peter Cordes Sep 10 '22 at 06:50
  • 3
    @Someprogrammerdude: `clang` uses its own built-in assembler. It's a compiler *and* an assembler. Fun fact: `clang -target arm64 -c foo.s` works on my x86-64 desktop without fork/exec of any other processes. But that's targeting AArch64 GNU/Linux, not MacOS. `clang -target arm64-macos -c foo.s` rejects `.section` directives and makes a Mach-O 64-bit arm64 object file, so I guess it's behaving like Apple's fork of clang running on MacOS targeting MachO64. – Peter Cordes Sep 11 '22 at 19:36

1 Answers1

1

In this case, MacOS uses different section names than other platforms (like ELF on Linux, or COFF/PE on Windows)

.text is an alias for .section __TEXT,__text,regular,pure_instructions when targeting MachO64, unlike .section .text when targeting an ELF object file. At least that's what I see in the asm output from clang -target arm64-macos -S hello.c on my x86-64 Linux desktop for a simple program.

AFAIK, only Clang can target Mach-O 64-bit object files; not sure if GNU Binutils ever got that support. gcc on a Mac is normally actually Apple's version of clang (with some differences from mainline clang like I have on my Linux desktop). Running it doesn't involve the GNU assembler. as on a Mac is also clang; it's a compiler and an assembler.


In general - target OS (or object-file format) and assembler matter, not just CPU architecture. Since x86 was mentioned in comments, that's a good example of an ISA where many different asm source syntaxes exist, and also different OSes with incompatible ABIs and object file formats.

Different x86 assemblers use different syntax for the same machine code. For example, AT&T movl $1234, (%rdi) is the same instruction as NASM mov dword [rdi], 1234. Or GAS .intel_syntax noprefix mov dword ptr [rdi], 1234 (also MASM).

So no, the ISA doesn't uniquely determine the asm text syntax for instructions. Only the machine code. It does narrow down the asm choices to a handful.

Quite a few ISAs only have one syntax for instructions themselves, although 32-bit ARM has some syntax variations, like whether ldreqb or ldrbeq is right. (Predicate as infix before a load-size suffix, or as a suffix). And there's Keil's ARMASM vs. GNU Assembler syntax.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847