When kernel developers want to write assembly, do they write in high level and convert it using compiler or they write in assembly directly?

Question

I read that for example some parts of Linux is in assembly, and i guess they write in assembly for faster speed of execution

but do modern kernel developers actually write directly in assembly when is needed, or they write in a high level language and convert it to assembly using compiler, and they use the converted assembly code instead?

which one is the better way? Isn't converting high level to assembly much more efficient considering compilers have code optimisation too? and which parts of my kernel do i really need to write in assembly?

I'm pretty sure no part of the Linux kernel has been written by modifying the assembly output of a compiler. This is generally only done by inexperienced assembly programmers, either novice assembly programmers writing their first assembly code or very rarely by professional programmers tasked with improving the performance of some existing code written in a high level language. These days, given how good compilers have become, in the later case most of the time they would fail in the task. — Ross Ridge, Aug 16 '18 at 13:54

score 6 · Accepted Answer · answered Aug 16 '18 at 07:23

I read that for example some parts of Linux is in assembly, and i guess they write in assembly for faster speed of execution

Generally not, they write [tiny] parts of kernel in assembly because the C language doesn't support some operations (for example switching to protected mode on x86 CPUs requires write to register which C language is not aware of).

Then again, the C is well suited for things like kernel (it's sort of "low level" language, although the longer I'm programming, the more confusing these categories are for me, at this moment I believe one of the highest abstraction level programming languages is actually C++, but many will not agree with me, and yet you can easily get quite low-level in C++ if the need arises), so most of the things can be written directly in it, it's only very tiny parts affecting some specific things of the target machine, which have to be finalized with asm code.

Consider for example something like memory manager.. most of the things about it (tracking free/allocated pages, virtual memory maps for different processes, etc...) are ordinary numbers in ordinary data structures, and can be easily handled in C. But setting up the final virtual memory layout for particular process may require different instructions depending on the target machine, and its MMU design, so there may be some small part of assembly enforcing those things calculated in C.

Peter Cordes · Answer 2 · 2018-08-16T11:58:33.810

Very few parts of Linux are written in asm for performance. See @Ped7g's answer for more about why kernels use inline asm for an occasional privileged instruction (like mov to/from control registers), or whole files of hand-written asm for entry points (like interrupt and system-call handler entry points that dispatch to a C function).

In Linux maybe just the RAID5 xor-parity (using SSE2 or AVX on x86) and RAID6 error-correction are written in asm for performance.

Those were presumably written directly in asm, because manually vectorizing in C with intrinsics isn't easier. The looping is still done with C in those Linux functions, IIRC.

(And it uses very bad style, with multiple separate asm("") statements that use the XMM or YMM registers. This happens to work, especially in kernel code where the compiler will never generate code that uses XMM registers, but using a single asm block, or vector output/input operands, would be safer. See Linux's lib/raid6/sse2.c for an example. There's also asm/xor.h which has some generic block-xor functions with the looping done in asm, too, presumably used by other parts of the kernel.) That's one of the few places it uses SIMD vector registers, because saving/restoring the FPU state is expensive.

Linux probably uses inline asm for performance for the x86 CRC32 instruction if available; several things use the CRC32C polynomial which x86 accelerates.

For the more general case of your question, using compiler-generates asm as a starting point for optimization is often a good idea.

But if the compiler already emits good asm, you don't need to do anything and can just use that C. That's even better than inline asm because it can optimize with constant-propagation and so on. Or maybe you can tweak the C source to help the compiler do a more efficient job.

But if you can't get the compiler to make an optimal loop, then sure you can take its asm and optimize it by hand. As long as you benchmark against the original, you can't lose to the compiler. (Except in cases where your asm defeats optimizations when inlining makes something a compile-time constant.)

For more details about helping vs. beating the compiler, see C++ code for testing the Collatz conjecture faster than hand-written assembly - why?.

You'd only consider using a hand-written asm loop for very critical portions of a piece of software, especially in a portable code-base like Linux, because you need a different implementation for every platform.

And because what's optimal on Skylake isn't what was optimal on P5 Pentium 20 years ago, and might not be optimal on some future x86 20 years from now. Sticking to portable C lets tuning options like -march=skylake do their job and make asm that's tuned for the specific microarchitecture you're compiling for. (Or lets updates in compilers default tuning take effect over the years.)

Not to mention that most kernel developers aren't asm tuning experts who can easily write near-optimal asm by hand. It's not something that people do often. If you like doing that, work on gcc or clang to make them generate more optimal code from C.

When kernel developers want to write assembly, do they write in high level and convert it using compiler or they write in assembly directly?

2 Answers2