Architecture-specific inline assembly

Question

I want to hand-write some inline assembly for part of a function but only have that assembly used when compiling for the architecture it's written for, falling back to a generic C implementation on other architectures.

What's the best way to have the compiler use the inline assembly when it's on a matching architecture? Is there a syntax for the asm block that specifies that it's only for a particular architecture, or should I use the autoconf target triplet in the configure script to define a preprocessor symbol? For example

configure.ac:

case $host in x86_64-*) AC_DEFINE([AMD64]) ;; esac

.c source file:

void f() {
#ifdef AMD64
   asm (/* ... */)
#else
  /* C code */
#endif
}

I suppose for larger or standalone functions the selection of assembly or C could similarly be done with an AM_CONDITIONAL to select a different source file (.c or platform-specific .s).

Are there other options? Is this idiomatic?

Edit: The question is more about whether there are alternatives such as

asm "i386" ( ... )

or

asm "aarch64" ( ... )

or some alternative that doesn't involve preprocessor ifdefs.

Edit 2: I was looking for the Function Multiversioning feature of GCC where multiple alternative implementations can be provided depending on the specific architecture, and the best version is automatically selected by the linker at runtime. I'll put that in an answer if I'm permitted to reopen the question.

Edit 3: The question applies to both architecture families like x86/amd64/arm64, but also to instruction set architectures (ISAs) like x86-SSE2, amd64-AVX, and so on.

The compiler defines a bunch of macros that tell you what architecture you are on. No need for a manual test. — fuz, Mar 28 '18 at 06:47
There's no syntax like `asm "i386" ( ... )` that expands on i386 and is a no-op on other targets, in GNU C or any other compiler I've heard of. Using the macros the compiler already defines *is* idiomatic, and what everyone does. — Peter Cordes, Mar 28 '18 at 17:27
This still looks like a duplicate of [How do I identify x86 vs. x86\_64 at compile time in gcc?](https://stackoverflow.com/q/30139983). Function multiversioning (https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html) is for runtime dispatching for different features within an architecture instead of compile-time selection of code for different architectures. I could reopen, but any answer based on that seems like it would be answering something very different from what the question asks. — Peter Cordes, Nov 15 '18 at 19:34
terminology: x86-64 + AVX is not considered a different ISA than baseline x86-64. It's an optional extension to x86-64 (and to IA32). Your argument that Function Multiversioning is relevant or useful here finally makes sense now. But it doesn't accomplish the goal of avoiding the preprocessor for architecture selection, does it? ARM intrinsics or asm instructions will cause a compile or assemble error when building for x86-64, right, regardless of a `target()` attribute? — Peter Cordes, Nov 16 '18 at 20:26
The question you've marked it duplicate of says nothing about inline assembly. A common use of inline assembly is for processor-specific features like SSE/MMX/AVX instructions, for which function multi-versioning appears to be the best solution. — T Percival, Nov 16 '18 at 20:27
It seems the best answer then is "both". Preprocessor conditions for compile-time differentiation depending on incompatible target architectures, and function multi-versioning for link-time selection of the best version depending on optional processor features. — T Percival, Nov 16 '18 at 20:28
Runtime dispatching is one solution. It's not the most efficient solution if compile-time selection is possible (making binaries that only need to run on the build host, with `-march=native`). — Peter Cordes, Nov 16 '18 at 20:28
Anyway, yes now you've modified the question so it's no longer a duplicate, I'll reopen. It already had 3 reopen votes before that. And yes, preprocessor for architecture selection, and runtime dispatching for ISA-extensions within that architecture is a valid approach for making a single set of binaries that are good everywhere. — Peter Cordes, Nov 16 '18 at 20:30
I'm aware of -march=native, but that's not especially useful when building a program to run on an entire architecture. The goal is: source-compatibility with all architectures that a C compiler can target, *and* the opportunity to use handwritten assembly when the CPU supports it, within a distributable binary. — T Percival, Nov 16 '18 at 20:31

T Percival · Answer 1 · 2018-11-16T20:57:24.393

For compile-time architecture selection, such as the distinction between amd64 & arm64, #ifdef on compiler-defined preprocessor directives is the common approach. A list of compiler-defined macros is available with cpp -dM -:

For example:

#ifdef __x86_64__
// impl.
#else
// default impl
#endif

For runtime-link selection of an optimized assembly alternative when the CPU supports it, Function Multi-versioning provides a way to ship multiple implementations of a function. The runtime linker determines which to use based on available CPU features.

__attribute__ ((target("default")))
void f() {
}

__attribute__ ((target("sse4.2")))
void f() {
}

Architecture-specific inline assembly

1 Answers1