Retaining Compatibility To Assembly With inline Functions

Question

I'm writing some header files, which are to be accessed by both C code and assembly. Assembly code is preprocessed with the C preprocessor for this sake.

The problem is I have plenty of inline functions in those header files. The assembler cannot process functions, which are not symbols in an object file (as with static inline functions), so I cannot use those. I've read this and this invaluable posts and have grasped how to use extern and static in conjunction with inline by now but I am unsure about how to make inline function accessible to both C code and assembly.

My current approach is to write inline functions (with >= GNU99, -O3 inlines the function, anything else calls an external definition of that function, which I need to define explicitly) in a header file and write external definitions in an implementation file. The C code includes the header file (the inline functions) compiles with -O3, thus using the inlined versions. The assembly code uses the external definitions.

Questions:

The assembly code can only call the functions, inlining is currently impossible. Can assembly code, by any means, make use of inlining? I mean as in an .S file, not inline assembly.
extern inline would be similarly good as my current method but it boils down to just one definition (the external definition is emitted automatically), so it cannot be divided into header and source file, which is crucial to make it accessible to C code (header) and assembly (source).
Is there any better method to achieve what I've been trying to do?

I don't use GAS by hand (I'm using it only in the process of building C/C++ code), but at least with Tasm, Masm and Nasm, you can write macros - these are always inserted inline wherever they're used. — enhzflep, Mar 29 '16 at 11:52
@enhzflep Yes, the `.macro` and `.endm` directives can be used to define a macro, but that would require me to write everything twice. Also, this wouldn't work out with C code I want to call in the assembly code because it has to be compiled beforehand. BTW, macros in GNU as seem not be syntax-checked for real. — cadaniluk, Mar 29 '16 at 11:55
Yup, seems like you get to introduce a rock and a hard place to one another, since you're between them. An interesting question, that's for sure. — enhzflep, Mar 29 '16 at 11:59

Peter Cordes · Accepted Answer · 2016-03-29T14:43:25.747

The overhead of a call forcing you to assume most registers are clobbered is pretty high. For high performance you need to manually inline your functions into asm so you can fully optimize everything.

Getting the compiler to emit a stand-alone definition and calling it should only be considered for code that's not performance-critical. You didn't say what you're writing in asm, or why, but I'm assuming that it is performance critical. Otherwise you'd just write it in C (with inline asm for any special instructions, I guess?).

If you don't want to manually inline, and you want to use these small inline C functions inside a loop, you'll probably get better performance from writing the whole thing in C. That would let the compiler optimize across a lot more code.

The register-arg calling conventions used for x86-64 are nice, but there are a lot of registers that are call-clobbered, so calls in the middle of computing stuff stop you from keeping as much data live in registers.

Can assembly code, by any means, make use of inlining? I mean as in an .S file, not inline assembly.

No, there's no syntax for the reverse of inline-asm. If there was, it would be something like: you tell the compiler what registers the inputs are in, what registers you want outputs in, and which registers it's allowed to clobber.

Common-subexpression-elimination and other significant optimizations between the hand-written asm and the compiler output wouldn't be possible without a compiler that really understood the hand-written asm, or treated it as source code and then emitted an optimized version of the whole thing.

Optimal inlining of compiler output into asm will typically require adjustments to the asm, which is why there aren't any programs to do it.

Is there any better method to achieve what I've been trying to do?

Now that you've explained in comments what your goals are: make small wrappers in C for the special instructions you want to use, instead of the other way around.

#include <stdint.h>
struct __attribute__((packed)) lgdt_arg {
    uint16_t limit;
    void * base;    // FIXME: always 64bit in long mode, including the x32 ABI where pointers and uintptr_t are 32bit.
                    // In 16bit mode, base is 24bit (not 32), so I guess be careful with that too
                    // you could just make this a uint64_t, since x86 is little-endian.
                    //  The trailing bytes don't matter since the instruction just uses a pointer to the struct.
};

inline void lgdt (const struct lgdt_arg *p) {
    asm volatile ("lgdt %0" : : "m"(*p) : "memory");
}

// Or this kind of construct sometimes gets used to make doubly sure compile-time reordering doesn't happen:
inline void lgdt_v2 (struct lgdt_arg *p) {
    asm volatile ("lgdt %0" : "+m"(*(volatile struct lgdt_arg *)p) :: "memory");
}
// that puts the asm statement into the dependency chain of things affecting the contents of the pointed-to struct, so the compiler is forced to order it correctly.


void set_gdt(unsigned size, char *table) {
  struct lgdt_arg tmp = { size, table };
  lgdt (&tmp);
}

set_gdt compiles to (gcc 5.3 -O3 on godbolt):

    movw    %di, -24(%rsp)
    movq    %rsi, -22(%rsp)
    lgdt -24(%rsp)
    ret

I've never written code involving lgdt. It's probably a good idea to use a "memory" clobber like I did to make sure any loads/stores aren't reordered across it at compile time. That will make sure the GDT it points to might is fully initialized before running LGDT. (Same for LIDT). Compilers might notice the that base gives the inline asm a reference to the GDT, and make sure its contents are in sync, but I'm not sure. There should be little to no downside to just using a "memory" clobber here.

Linux (the kernel) uses this sort of wrapper around an instruction or two all over the place, writing as little code as possible in asm. Look there for inspiration if you want.

re: your comments: yes you'll want to write your boot sector in asm, and maybe some other 16bit code since gcc's -m16 code is silly (still basically 32bit code).

No, there's no way to inline C compiler output into asm other than manually. That's normal and expected, for the same reason there aren't programs that optimize assembly. (i.e. read asm source, optimize, write different asm source).

Think about what such a program would have to do: it would have to understand the hand-written asm to be able to know what it could change without breaking the hand-written asm. Asm as a source language doesn't give an optimizer much to work with.

"[...] writing the whole thing in C." So abandon assembly compatibility altogether? I could resort to inline assembly/extended assembly when assembly is needed but that seems just so unclean. And seriously, why would GCC add something like [`__ASSEMBLER__`](https://gcc.gnu.org/onlinedocs/cpp/Standard-Predefined-Macros.html#Standard-Predefined-Macros), which my project strives for to support, without support for assembly-compatible `inline` functions? Not that I need to write `.S` files but it would be really nice. — cadaniluk, Mar 29 '16 at 13:37
Also, the main point of assembly in my project is access to special x86 instructions (LGDT, MONITOR, ARPL, etc.). Part of it is a bootloader, which I could write in C with `-m16` but that would consume way too much space with all those `0x66`/`0x67` prefixes and all. The inlining is for some two-liner routines I call very often and the [Intel Optimization Reference](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf) encourages inlining such functions. — cadaniluk, Mar 29 '16 at 13:40
@cad: If you're comfortable with GNU C inline asm, use that to wrap the special instructions, or give up on high performance. You can and should put that inline asm inside inline C functions to keep the code clean. As I explained in my answer, automated inlining into asm would essentially require decompiling / recompiling the asm to properly optimize. It's just not possible without treating the asm like LLVM IR or something, and not actually emitting the instructions as written. — Peter Cordes, Mar 29 '16 at 13:52
`__ASSEMBLER__` exists so you can write header files that leave out C prototypes and other stuff that isn't valid asm syntax when you include them from asm. (But that still `#define` constants for use in asm and C). — Peter Cordes, Mar 29 '16 at 13:54
@cad: updated my answer to address your comments in more detail. — Peter Cordes, Mar 29 '16 at 14:13
@cad: updated again to make the code actually compile, and correct a mistake in type naming in the code. — Peter Cordes, Mar 29 '16 at 14:44
LGDT and LIDT doesn't actually alter the data in the memory operand.You sure it has to be `"+m"`? I think `"m"` as an input operand should suffice. — Michael Petch, Aug 13 '17 at 05:21
@MichaelPetch: I wrote that as a belt-and-suspenders way of making sure gcc definitely had the value in memory. That makes it part of the dependency chain involving the variable, so there's no way it can reorder it with anything that even *reads* the value later. It's probably not necessary, and does lead to a reload of the value from memory if you do read it again later. I seem to recall reading that it was sometimes possible to have this kind of problem, but I don't remember any details. — Peter Cordes, Aug 13 '17 at 05:57

score 2 · Answer 2 · answered Mar 29 '16 at 21:24

The answer you linked to explains how C99 inline functions work but don't explain why the definition is that quirky. The relevant standard paragraph is ISO 9899:2011 §6.7.4 ¶6–7 (ISO 9899:1999 ibid.):

6 A function declared with an inline function specifier is an inline function. Making a function an inline function suggests that calls to the function be as fast as possible.¹³⁸⁾ The extent to which such suggestions are effective is implementation-defined. ¹³⁹⁾

7 Any function with internal linkage can be an inline function. For a function with external linkage, the following restrictions apply: If a function is declared with an inline function specifier, then it shall also be defined in the same translation unit. If all of the file scope declarations for a function in a translation unit include the inline function specifier without extern, then the definition in that translation unit is an inline definition. An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.¹⁴⁰⁾

138) By using, for example, an alternative to the usual function call mechanism, such as ”inline substitution”. Inline substitution is not textual substitution, nor does it create a new function. Therefore, for example, the expansion of a macro used within the body of the function uses the definition it had at the point the function body appears, and not where the function is called; and identifiers refer to the declarations in scope where the body occurs. Likewise, the function has a single address, regardless of the number of inline definitions that occur in addition to the external definition.

139) For example, an implementation might never perform inline substitution, or might only perform inline substitutions to calls in the scope of an inline declaration.

140) Since an inline definition is distinct from the corresponding external definition and from any other corresponding inline definitions in other translation units, all corresponding objects with static storage duration are also distinct in each of the definitions.

How does the definition of inline come into play? Well, if only inline declarations (without extern or static) of a function exist in a translation unit, no code for the funcion is emitted. But if a single declaration without inline or with extern exists, then code for the function is emitted, even if it is defined as an inline function. This design aspect allows you to describe the module that contains the machine code for an inline function without having to duplicate the implementation:

In your header file, place inline definitions:

fast_things.h

/* TODO: add assembly implementation */
inline int fast_add(int a, int b)
{
    return (a + b);
}

inline int fast_mul(int a, int b)
{
    return (a * b);
}

This header can be included in every translation module and provides inline definitions for fast_add and fast_mul. To generate the machine code for these two, add this file:

fast_things.c

#include "fast_things.h"
extern inline int fast_add(int, int);
extern inline int fast_mul(int, int);

You can avoid typing all of this out using some macro magic. Change fast_things.h like this:

#ifndef EXTERN_INLINE
#define EXTERN_INLINE_UNDEFINED
#define EXTERN_INLINE inline
#endif
EXTERN_INLINE int fast_add(int a, int b)
{
    return (a + b);
}

EXTERN_INLINE int fast_mul(int a, int b)
{
    return (a * b);
}
#ifdef EXTERN_INLINE_UNDEFINED
#undef EXTERN_INLINE
#undef EXTERN_INLINE_UNDEFINED
#endif

Then fast_things.c simply becomes:

#define EXTERN_INLINE extern inline
#include "fast_things.h"

Since code is emitted for the inline functions, you can call them from assembly just fine. You cannot however inline them in assembly as the assembler doesn't speak C.

There are also static inline functions which might be more suitable for your purpose (i.e. tiny helper functions) when you can make reasonably sure that they are always inlined.

The GNU assembler supports macros in its custom macro language. One possibility is to write a custom preprocessor that takes your inline assembly and emits both gcc-style inline assembly for C and gas macros. This should be possible with sed, m4, or awk (in descending order of difficulty). It might also be possible to abuse the C preprocessors stringify (#) operator for this; if you can give me a concrete example, I could try to throw something together.

Retaining Compatibility To Assembly With inline Functions

2 Answers2

fast_things.h

fast_things.c