Yes, D, Rust, Delphi, and quite a few other ahead-of-time-compiled languages have some form of inline asm.
Java doesn't, nor do most other languages that are normally JIT-compiled from a portable binary (like Java's .class bytecode, or C#'s CIL). Code injecting/assembly inlining in Java?.
Memory-safe languages like Rust only allow inline asm in unsafe{}
blocks because assembly language can mess up the program state in arbitrary ways if it's buggy, even more broadly than C undefined behaviour. Languages like Java intended to sandbox the guest program don't allow unsafe
code at all.
Very high level languages like Python don't even have simple object-representations for numbers, e.g. an integer variable isn't just a 32-bit object, it has type info, and (in Python specifically) can be arbitrary length for large values. So even if a Python implementation did have inline-asm facilities, it would be a challenge to let you do anything to Python objects, except maybe for NumPy arrays which are laid out like C arrays.
It's possible to call native machine-code functions (e.g. libraries compiled from C, or hand-written asm) from most high-level languages - that's usually important for writing some kinds of applications. For example, in Java there's JNI (Java Native Interface). Even node.js JavaScript can call native functions. "Marshalling" args into a form that makes sense to pass to a C function can be expensive, depending on the high-level language and whether you want to let the C / asm function modify an array or just return a value.
Different forms of inline asm in different languages
Often they're not MSVC's inefficient form like you're using (which forces a store/reload for inputs and outputs). Better designs, like Rust's modeled on GNU C inline asm can use registers. e.g. like GNU C asm("lzcnt %1, %0" : "=r"(leading_zero_count) : "rm"(input));
letting the compiler pick an output register, and pick register or a memory addressing mode for the input.
(But even better to use intrinsics like _lzcnt_u32
or __builtin_clz
for operations the compiler knows about, only inline asm for instructions the compiler doesn't have intrinsics for, or if you want to micro-optimize a loop in a certain way. https://gcc.gnu.org/wiki/DontUseInlineAsm)
Some (like Delphi) have inputs via a "calling convention" similar to a function call, with args in registers, not quite free mixing of asm and high-level code. So it's more like an asm block with fixed inputs, and one output in a specific register (plus side-effects) which the compiler can inline like it would a function.
For syntax like you show to work, either
- You have to manually save/restore every register you use inside the asm block (really bad for performance unless you're wrapping a big loop - apparently Borland Turbo C++ was like this)
- Or the compiler has to understand every single instruction to know what registers it might write (MSVC is like this). The design notes / discussion for Rust's inline asm mention this requirement for D or MSVC compilers to implement what's effectively a DSL (Domain Specific Language), and how much extra work that is, especially for portability to new ISAs.
Note that MSVC's specific implementation of inline asm was so brittle and clunky that it doesn't work safely in functions with register args, which meant not supporting it at all for x86-64, or ARM/AArch64 where the standard calling convention uses register args. Instead, they provide intriniscs for basically every instruction, including privileged ones like invlpg
, making it possible to write a kernel (such as Windows) in Visual C++. (Where other compilers would expect you to use asm()
for such things). Windows almost certainly has a few parts written in separate .asm files, like interrupt and system-call entry points, and maybe a context-switch function that has to load a new stack pointer, but with good intrinsics support you don't need asm, if you trust your compiler to make good-enough asm on its own.