How do I pass inputs into extended asm?

Question

Consider this code, from my earlier question.

int main(){
    asm("movq $100000000, %rcx;"
            "startofloop: ; "
            "sub $0x1, %rcx; "
            "jne startofloop; ");
}

I would like to make number of iterations of the loop a C variable, so I tried the following after reading this document.

int main(){                                      
    int count = 100000000;                       
    asm("movq %0, %rcx;"                         
            "startofloop: ; "                    
            "sub $0x1, %rcx; "                   
            "jne startofloop; ":: "r"(count));   
}

Unfortunately, this fails to compile, and breaks with the following error.

asm_fail.c: In function ‘main’:
asm_fail.c:3:5: error: invalid 'asm': operand number missing after %-letter
     asm("movq %0, %rcx;"
     ^
asm_fail.c:3:5: error: invalid 'asm': operand number missing after %-letter

What is the correct way to pass the value of the C variable into the assembly?

If using extended assembler templates (ones with input, output, clobbers etc) then you need to prepend an extra `%` on the register names. maybe try `%%rcx`. Since you overwrite RCX and it doesn't appear as an output operand it needs to be listed in the clobber list. — Michael Petch, Jun 21 '16 at 23:33
@MichaelPetch, Good point on the clobber. Unfortunately, when I do `%%rcx`, I get `operand type mismatch for 'movq'`. — merlin2011, Jun 21 '16 at 23:36
Yes that will likely be because `count` is a 32 bit integer and the assembler template chose a 32-bit register (like `eax`) for substitution. You can over ride `%0` to use the 64-bit register by placing `q` after the `%` so something like `movq %q0, %%rcx;` — Michael Petch, Jun 21 '16 at 23:40
See also [the bottom of this answer](http://stackoverflow.com/questions/34520013/using-base-pointer-register-in-c-inline-asm/34522750#34522750) for info on using GNU C inline asm to make code that doesn't suck (e.g. `movq` inside the template instead of using operand constraints to ask the compiler to set it up for you). It's really hard to get it correct while constraining the optimizer as little as possible, and it still can't do things like constant propagation through inline asm. So it's usually best to tweak the C to hand-hold the compiler to good asm, instead of using inline asm. — Peter Cordes, Jun 22 '16 at 08:04

Michael Petch · Accepted Answer · 2016-06-22T03:29:57.070

If using extended assembler templates (ones with input, output, clobbers etc) then you need to prepend an extra % on the register names inside the template. %%rcx in this case. This will solve the issue related to this error:

error: invalid 'asm': operand number missing after %-letter

This will present a new problem. You'll receive an error similar to:

operand type mismatch for 'movq'

The issue is that "r"(count) input constraint tells the compiler that it should pick a register that will contain the value in count. Since count is defined as an int type, it will choose a 32-bit register. For sake of argument assume it chooses EAX. After substitution it would have tried to generate this instruction:

movq %eax, %rcx

You can't use movq to move the contents of a 32-bit register to a 64-bit register and thus the error. The better choice is to use ECX as the target so that both will be of the same type. Revised code would look like:

asm("mov %0, %%ecx;"                         
    "startofloop: ; "                    
    "sub $0x1, %%ecx; "                   
    "jne startofloop; ":: "r"(count));

Alternatively you could have chosen to use an input operand of "ri"(count). This would allow the compiler to choose either a register or an immediate value. On a higher optimization level (-O1, -O2) it will likely determine in this case that count remains constant (100000000) and generate code like:

mov $100000000, %ecx                         
startofloop:
sub $0x1, %ecx
jne startofloop

Rather than being forced to place 100000000 into a register and copy it to ECX it can use an immediate value instead.

A serious problem in your template is that you destroy the contents of ECX but GCC has no knowledge of this. GCC doesn't actually parse the instructions inside the template to determine what the code does. It has no idea you have clobbered ECX. The compiler may rely on ECX having the same value before and after the template. If you destroy a register not referenced in the output operands, you must explicitly list it in the clobber list. Something like this would work:

asm("mov %0, %%ecx;"                         
    "startofloop: ; "                    
    "sub $0x1, %%ecx; "                   
    "jne startofloop; ":: "ri"(count) : "rcx");

Now GCC knows it can't rely on the value in RCX being the same value before and after the template is executed.

Rather than using a fixed register as your internal counter, you can get GCC to pick something that is available. Doing this will mean we don't need the clobber anymore. You can create a dummy variable (a temporary) that can be used to count with. To avoid this code being optimized out altogether we can use the volatile attribute on the assembler template. This isn't required when the assembler template has no output operands. Code like this would work:

int count=100000000
int dummy;
asm volatile("mov %1, %0;"                         
    "startofloop: ; "                    
    "sub $0x1, %0; "                   
    "jne startofloop; ":"=rm"(dummy): "ri"(count));

The =rm output constraint says that either a memory location or a register can be used for this operand. Giving the choice to the compiler allows the opportunity to generate better code. At an optimization level of -O1 you would likely find the code generated would look like:

mov    $0x5f5e100,%ebx
startofloop:
sub    $0x1,%ebx
jne    startofloop

In this case the compiler chose to use an immediate operand for count ($0x5f5e100 = $100000000). The dummy variable was optimized down to the register EBX.

There are other tricks you can do to improve the template. One can read more about extended assembler templates in the GNU documentation

Your code appeared to preserve the value in variable count. If it wasn't a requirement for count to have the same value before the template is executed you could use count for both input and output. That code could look like:

asm volatile("startofloop: ; "
    "sub $0x1, %0; "
    "jne startofloop; ":"+rm"(count): );

+rm means that the output operand is also being used as an input operand. In this case count should always be zero when finished.

If you use the GCC -S option to output the generated assembly code then you may wish to alter your template so the output looks cleaner. Rather than using a ; (semicolon) use \n\t instead. This will break up the assembler template into multiple lines and add indentation. An example:

asm volatile("mov %1, %0\n\t"                         
    "startofloop:\n\t"                    
    "sub $0x1, %0\n\t"                   
    "jne startofloop\n\t":"=rm"(dummy): "ri"(count));

Generally speaking, you shouldn't use inline assembler templates unless you have no alternative. Code it in C and guide the compiler to output the assembler you want, or use compiler intrinsics if need be. Inline assembler should be used as a last resort, or if your homework demands it. David Wohlferd wrote a Wiki article on the subject.

An excellent treatise. The only other points I might have added are 1) consider using symbolic names to make the assembler code easier to read. 2) inline asm is fun, interesting, challenging, educational, and a bad idea in production code. I just always say that since *this* guy might not have heard it yet. 3) "cc" clobber. 4) Since `dummy` is probably not used, volatile is probably a good idea to keep the optimizer from discarding the entire thing as dead code. — David Wohlferd, Jun 22 '16 at 01:48
@DavidWohlferd Regarding 2: I can just link to another answer that discusses that, Regarding point 3: Is it no longer the case that _GCC_ assumes "cc" is clobbered in extended assembler templates? Regarding 4: That is true, I had assumed though that the code was somehow a minimal example. I'll add the volatile since if he wants to observe the generated code it might be better if there was some ;-) — Michael Petch, Jun 22 '16 at 01:55
2: It's clear that the OP intent here was education, but it's easy for something like this to grow or someone else to be inspired, etc. 3: There's a fine line here: gcc *does* clobber cc on i386. However (despite my efforts), this is not a documented/supported feature, so in theory it can't be depended upon. In reality, I can't see it ever changing, so it is *probably* safe, but adding it is still good self-documentation. 4: Umm, shouldn't that be `asm volatile`? — David Wohlferd, Jun 22 '16 at 02:18
@DavidWohlferd Regarding _CC_, reasonable/fair comment. I won't update the answer in that regard. I seem to recall in documentation many moons ago that on intel x86 targets _cc_ was implied clobber. Highly doubt they would alter the behavior. If I were a _GCC_ dev might create a mechanism to propagate certain flags out of the template with a different mechanism. Think GCC 6.1 add some new template enhancements to support that idea. — Michael Petch, Jun 22 '16 at 02:40
cc: I know gcc's inline asm docs as well as the guy who wrote them (/s) and there is no mention of an implied cc clobber. Doesn't mean you didn't read it somewhere else though and I agree a change is unlikely. And yes they did add the ability to [output flags](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#FlagOutputOperands) in v6.0. After you read that, answer me this: If you are outputting flags, would you expect cc to be required? Or forbidden? The answer may surprise you... — David Wohlferd, Jun 22 '16 at 03:58
When I said many moons ago I'm thinking early 2000s, it wasn't recent and it may have been reading source code. As for `"cc"` being required, probably debatable with the new output constraints. I'd probably side with not being required (on intel x86 targets) since I'm still of the opinion it has been implied for many years. There is probably a pile of code out there that assumes the same. If "cc" in the clobbers (or without it) has the same meaning now, one could argue that it shouldn't be changed. — Michael Petch, Jun 22 '16 at 04:25
I think that the poor documentation over the years (I'm talking starting in 2000) surrounding extended assembler templates has come back to bite the project in the ass on stuff like this. It doesn't help that I use to read the Linux kernel and wonder what a particular constraint was when it wasn't documented by GCC at all. Had there been concrete documentation 15+ years ago, or a document about best practices may it would be different story now. — Michael Petch, Jun 22 '16 at 04:30
The source code definitely says that re cc, so maybe. But IMO, if it isn't 'implied' in the docs, it's not canon. As for outputs, this is a new feature. We could have gone either way. Since the docs already say "Clobber descriptions may not in any way overlap with an input or output operand," I tried to go with "forbidden." But they refused the code change to limit it to one or the other, or even to add text to the docs. So the answer is you can do either. 10 years downstream, 1/2 will be using each, preventing any mods. While it may never matter in this instance, the mindset makes me crazy. — David Wohlferd, Jun 22 '16 at 04:53
If there is bad documentation which there was early on, and the only place to find out in the source code, then IMHO the source code (inline comments) is canon in the absence of reasonable documentation. I might have a much different opinion of the matter if I started developing in the past few years. Maybe at a compiler option to switch between different semantics regarding `cc` in general (defaulting of course to what is currently actually done, and limiting issues with legacy code). But likely people won't turn the option on anyway lol — Michael Petch, Jun 22 '16 at 04:57
`"+rm"(count)` is a bad idea. You definitely want to compiler to put your loop counter in a register, so you should use a `"+r"` constraint. If you want to avoid clobbering the original value of the variable, use a scratch variable. (This is always possible in GNU C, even in a macro, using a `do { } while(0)` construct.) — Peter Cordes, Jun 22 '16 at 08:00
@MichaelPetch: If the count value was already in memory (e.g. a function arg on the stack), I wouldn't be surprised to see the compiler just leave it there. It would only load it ahead of time if it wanted the value in a register after the asm statement. It doesn't assume that you're going to touch the operand multiple times. If the asm statement was `asm("andl $0x1234, %0" : "+rm"(foo))`, a memory operand would be the correct choice even with no register pressure, and I assume that's what you'd get. If it's a tight loop, spilling anything else is probably better. — Peter Cordes, Jun 22 '16 at 08:28
re: dummy variable: Your `dummy` didn't avoid the `mov` in asm, which is the whole point of the exercise IMO. I meant do `dummy = count` in C, and then using a `"+r"` constraint. It's almost always sub-optimal to have a `mov` as the first or last instruction of an inline asm statement, and usually easy to avoid. — Peter Cordes, Jun 22 '16 at 08:34
I think my comment is enough here. I've written enough about how to write good inline asm in other places. re: downvotes, I only get bothered by downvotes with no explanation. I've had downvotes for reasons I disagree with, but that were explained, and I'm fine with that. This answer doesn't deserve a downvote, though; it's pretty good and explains what's going on. — Peter Cordes, Jun 22 '16 at 08:49

How do I pass inputs into extended asm?

1 Answers1

Related