5

Why cannot I use local variables from main to be used in basic asm inline? It is only allowed in extended asm, but why so?

(I know local variables are on the stack after return address (and therefore cannot be used once the function return), but that should not be the reason to not use them)

And example of basic asm:

int a = 10; //global a
int b = 20; //global b
int result;

int main() {
    asm ( "pusha\n\t"
          "movl a, %eax\n\t"
          "movl b, %ebx\n\t"
          "imull %ebx, %eax\n\t"
          "movl %eax, result\n\t"
          "popa");

    printf("the answer is %d\n", result);
    return 0;
}

example of extended:

int main (void) {
    int data1 = 10;  //local var - could be used in extended
    int data2 = 20;
    int result;

    asm ( "imull %%edx, %%ecx\n\t"
          "movl %%ecx, %%eax" 
          : "=a"(result)
          : "d"(data1), "c"(data2));

    printf("The result is %d\n",result);
    return 0;
}

Compiled with: gcc -m32 somefile.c

platform: uname -a: Linux 5.0.0-32-generic #34-Ubuntu SMP Wed Oct 2 02:06:48 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

ndim
  • 35,870
  • 12
  • 47
  • 57
Herdsman
  • 799
  • 1
  • 7
  • 24
  • 4
    Could you please give a concrete example of the thing you want to do and can't? It would also help to know exactly which compiler you are using, since they all have slightly different extensions for inline assembly. – zwol Feb 14 '20 at 14:12
  • 3
    If you want to access local variables, use extended inline assembly. The reason they cannot be used in basic assembly is that they do not have a fixed storage location and thus there is no way to access them without asking the compiler where it put them. – fuz Feb 14 '20 at 14:12
  • 1
    Because they're not symbols. – S.S. Anne Feb 14 '20 at 14:14
  • @Herdsman What you need to put into your question is what zwol said: tell us what compiler you use and what you are trying to achieve that requires you to use local variables in basic assembly. – fuz Feb 14 '20 at 14:21
  • @fuz "dont have fixed storage, have to ask compiler" then why do not the local var do so? (ask compiler for location)? Compiler is the only one, that has power over tranlsation and compilation, so is not right the job for compiler? It should "tell them", where there reside in memory (address), and be capable to give location from the stack (whete they are, as local) – Herdsman Feb 14 '20 at 18:50
  • @Herdsman It can do so. In fact, that's exactly what extended asm statements were designed to do. The `%0`, `%1` and so on place holders are replaced with the actual locations of the variables once the compiler has figured out where to place them. – fuz Feb 14 '20 at 22:52

4 Answers4

6

You can use local variables in extended assembly, but you need to tell the extended assembly construct about them. Consider:

#include <stdio.h>


int main (void)
{
    int data1 = 10;
    int data2 = 20;
    int result;

    __asm__(
        "   movl    %[mydata1], %[myresult]\n"
        "   imull   %[mydata2], %[myresult]\n"
        : [myresult] "=&r" (result)
        : [mydata1] "r" (data1), [mydata2] "r" (data2));

    printf("The result is %d\n",result);

    return 0;
}

In this [myresult] "=&r" (result) says to select a register (r) that will be used as an output (=) value for the lvalue result, and that register will be referred to in the assembly as %[myresult] and must be different from the input registers (&). (You can use the same text in both places, result instead of myresult; I just made it different for illustration.)

Similarly [mydata1] "r" (data1) says to put the value of expression data1 into a register, and it will be referred to in the assembly as %[mydata1].

I modified the code in the assembly so that it only modifies the output register. Your original code modifies %ecx but does not tell the compiler it is doing that. You could have told the compiler that by putting "ecx" after a third :, which is where the list of “clobbered” registers goes. However, since my code lets the compiler assign a register, I would not have a specific register to list in the clobbered register. There may be a way to tell the compiler that one of the input registers will be modified but is not needed for output, but I do not know. (Documentation is here.) For this task, a better solution is to tell the compiler to use the same register for one of the inputs as the output:

    __asm__(
        "   imull   %[mydata1], %[myresult]\n"
        : [myresult] "=r" (result)
        : [mydata1] "r" (data1), [mydata2] "0" (data2));

In this, the 0 with data2 says to make it the same as operand 0. The operands are numbered in the order they appear, starting with 0 for the first output operand and continuing into the input operands. So, when the assembly code starts, %[myresult] will refer to some register that the value of data2 has been placed in, and the compiler will expect the new value of result to be in that register when the assembly is done.

When doing this, you have to match the constraint with how a thing will be used in assembly. For the r constraint, the compiler supplies some text that can be used in assembly language where a general processor register is accepted. Others include m for a memory reference, and i for an immediate operand.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 1
    Good as a demo of early-clobber, but forcing a `mov` is sub-optimal in cases where one of the inputs isn't needed later. Using a matching constraint like `"0"(data1)` to force that input into the same register as it chose for the output would let the compiler choose to emit its own `mov` or not before the imul. – Peter Cordes Feb 14 '20 at 15:58
  • Now you don't need the early-clobber: `imul` reads both its inputs before writing its output. (It had better; x86 can't use a separate destination even if we wanted to.) IIRC, there's also a way to declare two inputs as commutative with each other, which would let the compiler choose which one to destroy. But adding more complication is probably not good for the purposes of this answer! – Peter Cordes Feb 14 '20 at 16:17
4

There is little distinction between "Basic asm" and "Extended asm"; "basic asm" is just a special case where the __asm__ statement has no lists of outputs, inputs, or clobbers. The compiler does not do % substitution in the assembly string for Basic asm. If you want inputs or outputs you have to specify them, and then it's what people call "extended asm".

In practice, it may be possible to access external (or even file-scope static) objects from "basic asm". This is because these objects will (respectively may) have symbol names at the assembly level. However, to perform such access you need to be careful of whether it is position-independent (if your code will be linked into libraries or PIE executables) and meets other ABI constraints that might be imposed at linking time, and there are various considerations for compatibility with link-time optimization and other transformations the compiler may perform. In short, it's a bad idea because you can't tell the compiler that a basic asm statement modified memory. There's no way to make it safe.

A "memory" clobber (Extended asm) can make it safe to access static-storage variables by name from the asm template.

The use-case for basic asm is things that modify the machine state only, like asm("cli") in a kernel to disable interrupts, without reading or writing any C variables. (Even then, you'd often use a "memory" clobber to make sure the compiler had finished earlier memory operations before changing machine state.)

Local (automatic storage, not static ones) variables fundamentally never have symbol names, because they don't exist in a single instance; there's one object per live instance of the block they're declared in, at runtime. As such, the only possible way to access them is via input/output constraints.

Users coming from MSVC-land may find this surprising since MSVC's inline assembly scheme papers over the issue by transforming local variable references in their version of inline asm into stack-pointer-relative accesses, among other things. The version of inline asm it offers however is not compatible with an optimizing compiler, and little to no optimization can happen in functions using that type of inline asm. GCC and the larger compiler world that grew alongside C out of unix does not do anything similar.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 3
    There is. In basic assembly you don't have to prefix `%`s with `%`s. In extended assembly you do. I didn't downvote. – S.S. Anne Feb 14 '20 at 14:14
  • Also, basic asm statements have implicit `memory` clobbers and are always considered `volatile` as far as I'm concerned. – fuz Feb 14 '20 at 14:15
  • Note that you can read and write external variables using basic assembly by just referencing the corresponding symbols. OPs question has its merit and your answer does not really address the reason for the apparent limitation. – fuz Feb 14 '20 at 14:18
  • @fuz: Yes, I've been updating it. – R.. GitHub STOP HELPING ICE Feb 14 '20 at 14:23
  • Also note that `%` substitution does happen in extended assembly statements even if you have no input and output operands. `asm("...")` does not behave the same way as `asm volatile("...":::"memory")`. – fuz Feb 14 '20 at 14:24
  • 1
    @fuz: Implicit `"memory"` clobber? Yes implicitly `volatile` (like Extended asm with no outputs), but https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html says the compiler *doesn't* know about changes you make to memory. And https://godbolt.org/z/X6fbnw proves there's no memory clobber in `asm("")`. So really **the *only* difference from Extended asm with empty constraint lists is not doing `%` substitution on the template**. (And thus not requiring `%%` for a literal `%`). I assume Basic asm still has an implicit `"cc"` clobber on i386 / x86-64, in case that ever matters. – Peter Cordes Feb 14 '20 at 14:51
  • @PeterCordes: Thanks, reverted that change. Also it sounds like accessing global data from basic asm is then invalid (due to missing memory clobber). – R.. GitHub STOP HELPING ICE Feb 14 '20 at 16:56
  • Correct. AFAIK, the only valid use-cases are things like `asm("cli")` to change something about the machine state (disable interrupts), not interact with C variables. Or to write a body for a `__attribute__((naked))` function. – Peter Cordes Feb 14 '20 at 17:00
  • 1
    @PeterCordes: That's *still* not correct because it's nor ordered with respect to the memory operations you want to protect by masking interrupts. Basically, "basic asm" is just "always a bug". – R.. GitHub STOP HELPING ICE Feb 14 '20 at 17:45
  • @PeterCordes: Your edits are things that really should have been written and integrated collaboratively via comments rather than as a non-author edit. I took some time to review them and I don't think any of it is something I'd want to revert, but changes that substantially change what the author is saying should happen after they've been able to review them, not before. – R.. GitHub STOP HELPING ICE Feb 14 '20 at 17:48
  • Part of the change was just something it looked like you missed based on what we were already discussing (that basic asm doesn't clobber memory). I usually figure it's better to make what I think is an improvement, especially when the author is active so they can tweak or roll back if they want. (It would be nice if there was a way to send the author a proposed edit, but there isn't. The downside of having imperfect text in an answer for a short time is not a big one, IMO). But anyway, next time if I'm looking at one of your answers, I'll comment first if I remember. – Peter Cordes Feb 14 '20 at 17:56
  • re: `asm("cli")` yes, good point about needing a memory clobber for some (most?) use cases. asm volatile is ordered wrt. `volatile` memory accesses, I think, so it could be ok for that. Or for simply disabling interrupts before calling a non-inline function. – Peter Cordes Feb 14 '20 at 17:58
  • @PeterCordes: Ah, the classic "external function calls as a compiler barrier" hack... ;-) – R.. GitHub STOP HELPING ICE Feb 14 '20 at 18:00
  • Yup :P I had in mind a hand-written asm function for some reason. Maybe a better example would be `asm("lfence")` as a Spectre barrier before a branch. It doesn't strictly need a memory clobber to still work, I think. Or maybe asm for `verw` (with some operand) for MDS mitigation where it just has to happen somewhere during a block. – Peter Cordes Feb 14 '20 at 18:09
  • Update, non-empty Basic asm statements have an implicit `"memory"` clobber in GCC7 and later: https://godbolt.org/z/oc3jvz, see https://gcc.gnu.org/legacy-ml/gcc-patches/2016-05/msg00419.html / https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24414#c20 for this mostly pointless change. No reason to ever use Basic Asm in new code, except as the body of a naked function, or at global scope. (Or for source compat with compilers that don't implement Extended asm syntax, and you really don't want to #if.) Apparently before this change, Basic Asm also didn't implicitly clobber `"cc"` on x86! – Peter Cordes Dec 27 '20 at 04:29
3

You can't safely use globals in Basic Asm statements either; it happens to work with optimization disabled but it's not safe and you're abusing the syntax.

There's very little reason to ever use Basic Asm. Even for machine-state control like asm("cli") to disable interrupts, you'd often want a "memory" clobber to order it wrt. loads / stores to globals. In fact, GCC's https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended page recommends never using Basic Asm because it differs between compilers, and GCC might change to treating it as clobbering everything instead of nothing (because of existing buggy code that makes wrong assumptions). This would make a Basic Asm statement that uses push/pop even more inefficient if the compiler is also generating stores and reloads around it.

Basically the only use-case for Basic Asm is writing the body of an __attribute__((naked)) function, where data inputs/outputs / interaction with other code follows the ABI's calling convention, instead of whatever custom convention the constraints / clobbers describe for a truly inline block of code.


The design of GNU C inline asm is that it's text that you inject into the compiler's normal asm output (which is then fed to the assembler, as). Extended asm makes the string a template that it can substitute operands into. And the constraints describe how the asm fits into the data-flow of the program logic, as well as registers it clobbers.

Instead of parsing the string, there is syntax that you need to use to describe exactly what it does. Parsing the template for var names would only solve part of the language-design problem that operands need to solve, and would make the compiler's code more complicated. (It would have to know more about every instruction to know whether memory, register, or immediate was allowed, and stuff like that. Normally its machine-description files only need to know how to go from logical operation to asm, not the other direction.)

Your Basic asm block is broken because you modify C variables without telling the compiler about it. This could break with optimization enabled (maybe only with more complex surrounding code, but happening to work is not the same thing as actually safe. This is why merely testing GNU C inline asm code is not even close to sufficient for it to be future proof against new compilers and changes in surrounding code). There is no implicit "memory" clobber. (Basic asm is the same as Extended asm except for not doing % substitution on the string literal. So you don't need %% to get a literal % in the asm output. It's implicitly volatile like Extended asm with no outputs.)

Also note that if you were targeting i386 MacOS, you'd need _result in your asm. result only happens to work because the asm symbol name exactly matches the C variable name. Using Extended asm constraints would make it portable between GNU/Linux (no leading underscore) vs. other platforms that do use a leading _.

Your Extended asm is broken because you modify an input ("c") (without telling the compiler that register is also an output, e.g. an output operand using the same register). It's also inefficient: if a mov is the first or last instruction of your template, you're almost always doing it wrong and should have used better constraints.

Instead, you can do:

    asm ("imull %%edx, %%ecx\n\t"
          : "=c"(result)
          : "d"(data1), "c"(data2));

Or better, use "+r"(data2) and "r"(data1) operands to give the compiler free choice when doing register allocation instead of potentially forcing the compiler to emit unnecessary mov instructions. (See @Eric's answer using named operands and "=r" and a matching "0" constraint; that's equivalent to "+r" but lets you use different C names for the input and output.)

Look at the asm output of the compiler to see how code-gen happened around your asm statement, if you want to make sure it was efficient.


Since local vars don't have a symbol / label in the asm text (instead they live in registers or at some offset from the stack or frame pointer, i.e. automatic storage), it can't work to use symbol names for them in asm.

Even for global vars, you want the compiler to be able to optimize around your inline asm as much as possible, so you want to give the compiler the option of using a copy of a global var that's already in a register, instead of getting the value in memory in sync with a store just so your asm can reload that.

Having the compiler try to parse your asm and figure out which C local var names are inputs and outputs would have been possible. (But would be a complication.)

But if you want it to be efficient, you need to figure out when x in the asm can be a register like EAX, instead of doing something braindead like always storing x into memory before the asm statement, and then replacing x with 8(%rsp) or whatever. If you want to give the asm statement control over where inputs can be, you need constraints in some form. Doing it on a per-operand basis makes total sense, and means the inline-asm handling doesn't have to know that bts can take an immediate or register source but not memory, for and other machine-specific details like that. (Remember; GCC is a portable compiler; baking a huge amount of per-machine info into the inline-asm parser would be bad.)

(MSVC forces all C vars in _asm{} blocks to be memory. It's impossible to use to efficiently wrap a single instruction because the input has to bounce through memory, even if you wrap it in a function so you can use the officially-supported hack of leaving a value in EAX and falling off the end of a non-void function. What is the difference between 'asm', '__asm' and '__asm__'? And in practice MSVC's implementation was apparently pretty brittle and hard to maintain, so much so that they removed it for x86-64, and it was documented as not supported in function with register args even in 32-bit mode! That's not the fault of the syntax design, though, just the actual implementation.)

Clang does support -fasm-blocks for _asm { ... } MSVC-style syntax where it parses the asm and you use C var names. It probably forces inputs and outputs into memory but I haven't checked.


Also note that GCC's inline asm syntax with constraints is designed around the same system of constraints that GCC-internals machine-description files use to describe the ISA to the compiler. (The .md files in the GCC source that tell the compiler about an instruction to add numbers that takes inputs in "r" registers, and has the text string for the mnemonic. Notice the "r" and "m" in some examples in https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html).

The design model of asm in GNU C is that it's a black-box for optimizer; you must fully describe the effects of the code (to the optimizer) using constraints. If you clobber a register, you have to tell the compiler. If you have an input operand that you want to destroy, you need to use a dummy output operand with a matching constraint, or a "+r" operand to update the corresponding C variable's value.

If you read or write memory pointed-to by a register input, you have to tell the compiler. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?

If you use the stack, you have to tell the compiler (but you can't, so instead you have to avoid stepping on the red-zone :/ Using base pointer register in C++ inline asm) See also the inline-assembly tag wiki

GCC's design makes it possible for the compiler to give you an input in a register, and use the same register for a different output. (Use an early-clobber constraint if that's not ok; GCC's syntax is designed to efficiently wrap a single instruction that reads all its inputs before writing any of its outputs.)

If GCC could only infer all of these things from C var names appearing in asm source, I don't think that level of control would be possible. (At least not plausible.) And there'd probably be surprising effects all over the place, not to mention missed optimizations. You only ever use inline asm when you want maximum control over things, so the last thing you want is the compiler using a lot of complex opaque logic to figure out what to do.

(Inline asm is complex enough in its current design, and not used much compared to plain C, so a design that requires very complex compiler support would probably end up with a lot of compiler bugs.)


GNU C inline asm isn't designed for low-performance low-effort. If you want easy, just write in pure C or use intrinsics and let the compiler do its job. (And file missed-optimization bug reports if it makes sub-optimal code.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • `Your Extended asm is broken because you modify a read-only input ("c")`, where is said it is "readonly"? I used general purpose register `xcx`, that could be read as well as write to. I decided to put there a address of `data2`, but that is not wrong. It is indeed inefficient, but not grammatically wrong – Herdsman Feb 14 '20 at 23:45
  • @Herdsman: You told the compiler it was a pure input using a `"c"` constraint, not `"+c"`. Also no, you asked for the *value* of `data2`, not `&data2`. Asking for the address and then dereferencing in the asm (to read memory that wasn't explicitly declared as an input) would have been another error (unless you use a memory clobber or dummy memory input). See the links in the last section of my answer. – Peter Cordes Feb 15 '20 at 05:35
  • `Or better, use "+r"`, but how? If I do (output) `: "=+r" (result)`, then `operand constraint contains incorrectly positioned ‘+’ or ‘=’`, I do not know how to string more modifier (`=`, `+`), so letting the compiler know, I want to have output `result` as `read\write`. But when I used the Eric's answer, he used `: [myresult] "=&r" (result)`, which does emit no error (so there he stringed two modifier `=` and `&`, but is correct). So how to string more modifier. Also, does output `result` is that one, that will be clobbered? (so I will use the `+` modifier) – Herdsman Feb 15 '20 at 20:42
  • @Herdsman: read the GCC manual or any tutorial. Or see [segmentation fault(core dumped) error while using inline assembly](//stackoverflow.com/a/60242248) which has examples using it. – Peter Cordes Feb 15 '20 at 20:45
1

This is because asm is a defined language which is common for all compilers on the same processor family. After using the __asm__ keyword, you can reliably use any good manual for the processor to then start writing useful code.

But it does not have a defined interface for C, and lets be honest, if you don't interface your assembler with your C code then why is it there?

Examples of useful very simple asm: generate a debug interrupt; set the floating point register mode (exceptions/accuracy);

Each compiler writer has invented their own mechanism to interface to C. For example in one old compiler you had to declare the variables you want to share as named registers in the C code. In GCC and clang they allow you to use their quite messy 2-step system to reference an input or output index, then associate that index with a local variable.

This mechanism is the "extension" to the asm standard.

Of course, the asm is not really a standard. Change processor and your asm code is trash. When we talk in general about sticking to the c/c++ standards and not using extensions, we don't talk about asm, because you are already breaking every portability rule there is.

Then, on top of that, if you are going to call C functions, or your asm declares functions that are callable by C then you will have to match to the calling conventions of your compiler. These rules are implicit. They constrain the way you write your asm, but it will still be legal asm, by some criteria.

But if you were just writing your own asm functions, and calling them from asm, you may not be constrained so much by the c/c++ conventions: make up your own register argument rules; return values in any register you want; make stack frames, or don't; preserve the stack frame through exceptions - who cares?

Note that you might still be constrained by the platform's relocatable code conventions (these are not "C" conventions, but are often described using C syntax), but this is still one way that you can write a chunk of "portable" asm functions, then call them using "extended" embedded asm.

Gem Taylor
  • 5,381
  • 1
  • 9
  • 27
  • For the first paragraph: each toolchain uses a different assembly dialect, even on the same platform. There are a lot of syntactical and some times slight semantic differences (e.g. with respect to calling conventions and linker features). – fuz Feb 14 '20 at 14:23
  • But if we are talking pure assembly you can write your own calling conventions. The moment you talk about calling conventions and linkers you are talking about the interface with c/c++. – Gem Taylor Feb 14 '20 at 14:25
  • 1
    You can certainly not write useful inline assembly without knowing anything about the ABI used. Also, consider that depending on the compiler, one might expect `mov %eax, %ebx` while the other expects `mov ebx, eax` in its inline assembly. The language is not common to all compilers (or assemblers). – fuz Feb 14 '20 at 14:28
  • Agreed, see edits, but I think it is still fair to talk about asm that is easily understood/derived from the asm manuals for that platform, and the C interfaces. I have seen differences like you describe even in the asm compilers themselves, between Microsoft and Borland, for example. – Gem Taylor Feb 14 '20 at 14:54
  • Then why is it there? The use-case for Basic Asm is limited to things like `asm("cli")` to disable interrupts, or other machine-state-affecting instructions. As you say, not for doing anything with C variables. Or for writing the body of an `__attribute__((naked))` function, where you implement the calling convention and stack layout yourself. – Peter Cordes Feb 14 '20 at 15:02
  • @GemTaylor can you give some references to asm compiler internals to see those differences you are talking about (the last sentence "I have seen differences in the asm compilers themselves)? (probably differences in comparison of different OSes) – Herdsman Feb 14 '20 at 19:24
  • @Herdsman It is my recollection that the Borland asm compiler vs micorsoft's macro compiler, both for windows, both for 8086, back in the '80s had a subtly different way of representing certain implicit addressing modes, for example. I'm sure there was some difference in bracketing style as well. – Gem Taylor Feb 17 '20 at 17:18