The overhead of a call
forcing you to assume most registers are clobbered is pretty high. For high performance you need to manually inline your functions into asm so you can fully optimize everything.
Getting the compiler to emit a stand-alone definition and calling it should only be considered for code that's not performance-critical. You didn't say what you're writing in asm, or why, but I'm assuming that it is performance critical. Otherwise you'd just write it in C (with inline asm for any special instructions, I guess?).
If you don't want to manually inline, and you want to use these small inline C functions inside a loop, you'll probably get better performance from writing the whole thing in C. That would let the compiler optimize across a lot more code.
The register-arg calling conventions used for x86-64 are nice, but there are a lot of registers that are call-clobbered, so calls in the middle of computing stuff stop you from keeping as much data live in registers.
Can assembly code, by any means, make use of inlining? I mean as in an
.S file, not inline assembly.
No, there's no syntax for the reverse of inline-asm. If there was, it would be something like: you tell the compiler what registers the inputs are in, what registers you want outputs in, and which registers it's allowed to clobber.
Common-subexpression-elimination and other significant optimizations between the hand-written asm and the compiler output wouldn't be possible without a compiler that really understood the hand-written asm, or treated it as source code and then emitted an optimized version of the whole thing.
Optimal inlining of compiler output into asm will typically require adjustments to the asm, which is why there aren't any programs to do it.
Is there any better method to achieve what I've been trying to do?
Now that you've explained in comments what your goals are: make small wrappers in C for the special instructions you want to use, instead of the other way around.
#include <stdint.h>
struct __attribute__((packed)) lgdt_arg {
uint16_t limit;
void * base; // FIXME: always 64bit in long mode, including the x32 ABI where pointers and uintptr_t are 32bit.
// In 16bit mode, base is 24bit (not 32), so I guess be careful with that too
// you could just make this a uint64_t, since x86 is little-endian.
// The trailing bytes don't matter since the instruction just uses a pointer to the struct.
};
inline void lgdt (const struct lgdt_arg *p) {
asm volatile ("lgdt %0" : : "m"(*p) : "memory");
}
// Or this kind of construct sometimes gets used to make doubly sure compile-time reordering doesn't happen:
inline void lgdt_v2 (struct lgdt_arg *p) {
asm volatile ("lgdt %0" : "+m"(*(volatile struct lgdt_arg *)p) :: "memory");
}
// that puts the asm statement into the dependency chain of things affecting the contents of the pointed-to struct, so the compiler is forced to order it correctly.
void set_gdt(unsigned size, char *table) {
struct lgdt_arg tmp = { size, table };
lgdt (&tmp);
}
set_gdt
compiles to (gcc 5.3 -O3
on godbolt):
movw %di, -24(%rsp)
movq %rsi, -22(%rsp)
lgdt -24(%rsp)
ret
I've never written code involving lgdt
. It's probably a good idea to use a "memory" clobber like I did to make sure any loads/stores aren't reordered across it at compile time. That will make sure the GDT it points to might is fully initialized before running LGDT
. (Same for LIDT
). Compilers might notice the that base
gives the inline asm a reference to the GDT, and make sure its contents are in sync, but I'm not sure. There should be little to no downside to just using a "memory" clobber here.
Linux (the kernel) uses this sort of wrapper around an instruction or two all over the place, writing as little code as possible in asm. Look there for inspiration if you want.
re: your comments: yes you'll want to write your boot sector in asm, and maybe some other 16bit code since gcc's -m16 code is silly (still basically 32bit code).
No, there's no way to inline C compiler output into asm other than manually. That's normal and expected, for the same reason there aren't programs that optimize assembly. (i.e. read asm source, optimize, write different asm source).
Think about what such a program would have to do: it would have to understand the hand-written asm to be able to know what it could change without breaking the hand-written asm. Asm as a source language doesn't give an optimizer much to work with.