2

I recently read this document titled Embedded Systems/Mixed C and Assembly Programming

It basically deals with how C and C++ allow the user to use Assembly code via a technique called inline assembly that looks sort of like this:

#include<stdio.h>
 
void main() {
 
   int a = 3, b = 3, c;
 
   asm {
      mov ax,a
      mov bx,b
      add ax,bx
      mov c,ax
   }
 
   printf("%d", c);
}

And I was wondering if a similar interaction was possible in other high-level languages like Java, Python and others, or if this was only possible with C and C++.

Leon Horka
  • 153
  • 2
  • 15
  • 2
    Yes, D, Rust, Delphi, and quite a few other ahead-of-time-compiled languages have some form of inline asm. Often not MSVC's inefficient form like you're using (which [forces a store/reload for inputs and outputs](https://stackoverflow.com/questions/3323445/what-is-the-difference-between-asm-asm-and-asm/35959859#35959859)). Better designs, like Rust's modeled on GNU C inline asm can use registers. – Peter Cordes May 18 '21 at 19:04
  • 1
    This list is long, really long, since inline assembly has been a thing since assembly was a thing. – tadman May 18 '21 at 19:07
  • 1
    Forth and some types of LISP. – NomadMaker May 18 '21 at 19:09
  • @tadman: I thought the first assembler predated the first high-level language. You don't really have "inline asm" in an assembly source file; it's just more code. – Peter Cordes May 18 '21 at 19:47
  • @PeterCordes The first assembler was at the time considered a "high-level language". It's all relative to what you consider low-level, and at that time it meant "machine code". – tadman May 18 '21 at 19:57
  • @tadman: So you're arguing that `db 0x90` to emit arbitrary bytes (e.g. manually encode an x86 `nop`) is "inline machine code", the asm-source equivalent of inline asm in high-level languages? Sure, if you want, but nobody calls it "inline asm". – Peter Cordes May 18 '21 at 19:59
  • @PeterCordes Our modern conception of "inline assembly" is really a product of how C did it, which is a product of assembly notation having been vaguely standardized earlier. – tadman May 18 '21 at 20:01
  • 1
    I would expect FORTRAN and BASIC to interact with assembly languages, especially FORTRAN. You should take a look at Ada and Python. – Thomas Matthews May 18 '21 at 21:24

2 Answers2

7

Yes, D, Rust, Delphi, and quite a few other ahead-of-time-compiled languages have some form of inline asm.

Java doesn't, nor do most other languages that are normally JIT-compiled from a portable binary (like Java's .class bytecode, or C#'s CIL). Code injecting/assembly inlining in Java?.

Memory-safe languages like Rust only allow inline asm in unsafe{} blocks because assembly language can mess up the program state in arbitrary ways if it's buggy, even more broadly than C undefined behaviour. Languages like Java intended to sandbox the guest program don't allow unsafe code at all.

Very high level languages like Python don't even have simple object-representations for numbers, e.g. an integer variable isn't just a 32-bit object, it has type info, and (in Python specifically) can be arbitrary length for large values. So even if a Python implementation did have inline-asm facilities, it would be a challenge to let you do anything to Python objects, except maybe for NumPy arrays which are laid out like C arrays.

It's possible to call native machine-code functions (e.g. libraries compiled from C, or hand-written asm) from most high-level languages - that's usually important for writing some kinds of applications. For example, in Java there's JNI (Java Native Interface). Even node.js JavaScript can call native functions. "Marshalling" args into a form that makes sense to pass to a C function can be expensive, depending on the high-level language and whether you want to let the C / asm function modify an array or just return a value.


Different forms of inline asm in different languages

Often they're not MSVC's inefficient form like you're using (which forces a store/reload for inputs and outputs). Better designs, like Rust's modeled on GNU C inline asm can use registers. e.g. like GNU C asm("lzcnt %1, %0" : "=r"(leading_zero_count) : "rm"(input)); letting the compiler pick an output register, and pick register or a memory addressing mode for the input.

(But even better to use intrinsics like _lzcnt_u32 or __builtin_clz for operations the compiler knows about, only inline asm for instructions the compiler doesn't have intrinsics for, or if you want to micro-optimize a loop in a certain way. https://gcc.gnu.org/wiki/DontUseInlineAsm)

Some (like Delphi) have inputs via a "calling convention" similar to a function call, with args in registers, not quite free mixing of asm and high-level code. So it's more like an asm block with fixed inputs, and one output in a specific register (plus side-effects) which the compiler can inline like it would a function.


For syntax like you show to work, either

  • You have to manually save/restore every register you use inside the asm block (really bad for performance unless you're wrapping a big loop - apparently Borland Turbo C++ was like this)
  • Or the compiler has to understand every single instruction to know what registers it might write (MSVC is like this). The design notes / discussion for Rust's inline asm mention this requirement for D or MSVC compilers to implement what's effectively a DSL (Domain Specific Language), and how much extra work that is, especially for portability to new ISAs.

Note that MSVC's specific implementation of inline asm was so brittle and clunky that it doesn't work safely in functions with register args, which meant not supporting it at all for x86-64, or ARM/AArch64 where the standard calling convention uses register args. Instead, they provide intriniscs for basically every instruction, including privileged ones like invlpg, making it possible to write a kernel (such as Windows) in Visual C++. (Where other compilers would expect you to use asm() for such things). Windows almost certainly has a few parts written in separate .asm files, like interrupt and system-call entry points, and maybe a context-switch function that has to load a new stack pointer, but with good intrinsics support you don't need asm, if you trust your compiler to make good-enough asm on its own.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
1

You can inline assembly in HolyC.

  • It's not really surprising that a derivative of C will support C-style inline asm. https://harrisontotty.github.io/p/a-lang-design-analysis-of-holyc is a quick summary of some of its design features. It is a different language from C, but not by much. – Peter Cordes May 18 '21 at 21:09