at&t asm inline c++ problem

Question

My Code

const int howmany = 5046;
char buffer[howmany];
    asm("lea     buffer,%esi"); //Get the address of buffer
    asm("mov     howmany,%ebx");         //Set the loop number
    asm("buf_loop:");                      //Lable for beginning of loop
    asm("movb     (%esi),%al");             //Copy buffer[x] to al
    asm("inc     %esi");                   //Increment buffer address
    asm("dec     %ebx");                   //Decrement loop count
    asm("jnz     buf_loop");              //jump to buf_loop if(ebx>0)

My Problem

I am using the gcc compiler. For some reason my buffer/howmany variables are undefined in the eyes of my asm. I'm not sure why. I just want to move the beginning address of my buffer array into the esi register, loop it 'howmany' times while copying each element to the al register.

Advice when having a problem: (1) What is the expected behaviour? (2) What is the observed behaviour? — paxdiablo, Dec 24 '09 at 02:41
@kelton52, then the problem is at compile time rather than run time. There should still be behaviour, error messages from the compiler perhaps? — paxdiablo, Dec 24 '09 at 02:51
@paxdiablo - Your comments arn't very productive. To someone who knows what they are doing, my mistake should be obvious. — Kelly Elton, Dec 24 '09 at 03:00
@kelton52: There's no way to *know* what anyone is doing here until you at least tell people what compiler you are using. There's no (and there can't be any) standard and/or consistency across compilers when it comes to inline asm. Until then nothing can be obvious here. — AnT stands with Russia, Dec 24 '09 at 03:04
@kelton52, as I mentioned in my answer: if gcc is what you're using, your mistake IS obvious -- you're not using %0 and %1, but rather C expressions directly, in your assembly code, and that doesn't work in gcc. If you're using some other compiler, please mention WHICH one, and we'll see if there are experts of that particular dialect around to help you (otherwise, DO try the solution I recommend for gcc anyway: it might well work in other compilers too!-). — Alex Martelli, Dec 24 '09 at 03:24
Ha, ha, way to go, @Kelton52. Asking for help then insulting those who are offering it is not really conducive to solving your problem :-) People who post questions like "I have a problem" aren't doing themselves any favours - I was merely pointing out that *good* questions will state what you expect and what is actually happening - we're not mind-readers (and you still haven't posted the actual error message). I have little time to waste on ingrates when there are people here who are more appreciative. Best of luck with your problem anyway, cheers, and have a good Xmas break. — paxdiablo, Dec 24 '09 at 04:00
@kelton52: It's not obvious because we're neither omniscient nor psychic. Technical problems need to be well-defined before they can be solved. As for "expected/actual behavior", not compiling is a behavior, but a rather broad one. Any compiler output, even if no files are created and no messages printed, counts as part of its behavior. http://catb.org/~esr/faqs/smart-questions.html — outis, Dec 24 '09 at 08:26
I wasn't having a bad attitude. Someone who's worked with this before should be able to look at my code and go, 'Oh, you can't do that', or 'you need to change this', like Alex below. It is my fault I didn't mention gcc right away, and that is the crucial bit. And nothing I said warranted you all 'biting' back as you did. I'm grateful towards useful information such as 'hey dip**** you forgot to put the compiler you're using', but behaviour and error codes, it is an obvious thing...the codes commented. If ones worked with gcc and inline assembly, such as Alex below, they know that it's wrong. — Kelly Elton, Dec 24 '09 at 10:29
@kelton52: you weren't bitten until your insulting "To someone who knows what they are doing..." comment (read http://catb.org/~esr/faqs/smart-questions.html#keepcool for the mindset you're up against). To participate within a community, you must abide by its standards. In particular, questions should be well defined and a request for clarification and additional information should always be honored. Remember, we want to help, but we can't without enough information. Also, there might be issues you're not aware of, which is why more information is often needed. — outis, Dec 25 '09 at 07:31

score 6 · Accepted Answer · answered Dec 24 '09 at 02:47

6

Are you using the inline assembler in gcc? (If not, in what other C++ compiler, exactly?)

If gcc, see the details here, and in particular this example:

    asm ("leal (%1,%1,4), %0"
         : "=r" (five_times_x)
         : "r" (x) 
         );

%0 and %1 are referring to the C-level variables, and they're listed specifically as the second (for outputs) and third (for inputs) parameters to asm. In your example you have only "inputs" so you'd have an empty second operand (traditionally one uses a comment after that colon, such as /* no output registers */, to indicate that more explicitly).

answered Dec 24 '09 at 02:47

Alex Martelli

854,459
170
1,222
1,395

I have no grasp of the watcom nonsense...I've read quite a bit and still can't quite grasp it...I would think I should be able to simply move the address of my buffer into a register. – Kelly Elton Dec 24 '09 at 03:04
What "watcom"? I'm talking about gcc -- in THAT compiler (which is very widespread, you know) you have to use %0, %1 and so on in the assembly code proper, then connect them to input and output C expressions in the second and third args of `asm` -- and that's all there is to it. Feel free to disapprove of gcc's design (and to contribute changes to it, since it's open source), but I'm telling you how it DOES work. Your example code doesn't follow these rules, which is why it isn't working (if gcc's your compiler;-). – Alex Martelli Dec 24 '09 at 03:21
This doesn't come close to addressing how wrong the OP's code is. There's no obvious way to go from this answer to a safe inline asm version of the OP's loop. I probably shouldn't have spent an hour writing an answer to this 8-year-old question, but I did anyway >. – Peter Cordes Nov 17 '17 at 19:56

score 1 · Answer 2 · edited Dec 24 '09 at 02:58

1

The part that declares an array like that

int howmany = 5046;
char buffer[howmany];

is not valid C++. In C++ it is impossible to declare an array that has "variable" or run-time size. In C++ array declarations the size is always a compile-time constant.

If your compiler allows this array declaration, it means that it implements it as an extension. In that case you have to do your own research to figure out how it implements such a run-time sized array internally. I would guess that internally buffer will be implemented as a pointer, not as a true array. If my guess is correct and it is really a pointer, then the proper way to load the address of the array into esi might be

mov buffer,%esi

and not a lea, as in your code. lea will only work with "normal" compile-time sized arrays, but not with run-time sized arrays.

Another question is whether you really need a run-time sized array in your code. Could it be that you just made it so by mistake? If you simply change the howmany declaration to

const int howmany = 5046;

the array will turn into an "normal" C++ array and your code might start working as is (i.e. with lea).

edited Dec 24 '09 at 02:58

paxdiablo

854,327
234
1,573
1,953

answered Dec 24 '09 at 02:49

AnT stands with Russia

312,472
42
525
765

I forgot to add const in my example, so that isn't an issue. Thanks anyhow though. – Kelly Elton Dec 24 '09 at 03:02
The GNU dialect of C++ supports C99-style VLAs as an extension. Any compiler that supports GNU-style inline asm will also support that. (Unless that's supposed to be at the global scope... oh, but it doesn't compile because there is no `buffer` symbol, so yeah it's a local VLA.) – Peter Cordes Nov 17 '17 at 18:17
@Peter Cordes: Yes, but I doubt that GNU's will be able to to figure out that in case of VLA `lea` has to be replaced with `mov`. From assembler point of view, both `lea` and `mov` make their own sense. – AnT stands with Russia Nov 17 '17 at 18:31
Neither one is viable for a *local*, because they don't appear as symbol names in the asm output. That's why you need to use an input constraint. VLAs are no different that regular arrays with automatic storage in that respect. – Peter Cordes Nov 17 '17 at 18:32
Posted my own answer, because neither yours nor the accepted one comes anywhere near addressing what's wrong with the OP's code. (And yours is definitely wrong; it will either not assemble, or it will load the first 4 bytes if the array is static / global) – Peter Cordes Nov 17 '17 at 19:53

Peter Cordes · Answer 3 · 2017-11-18T06:47:06.057

All of those asm instructions need to be in the same asm statement if you want to be sure they're contiguous (without compiler-generated code between them), and you need to declare input / output / clobber operands or you will step on the compiler's registers.

You can't use lea or mov to/from a C variable name (except for global / static symbols which are actually defined in the compiler's asm output, but even then you usually shouldn't).

Instead of using mov instructions to set up inputs, ask the compiler to do it for you using input operand constraints. If the first or last instruction of a GNU C inline asm statement, usually that means you're doing it wrong and writing inefficient code.

And BTW, GNU C++ allows C99-style variable-length arrays, so howmany is allowed to be non-const and even set in a way that doesn't optimize away to a constant. Any compiler that can compile GNU-style inline asm will also support variable-length arrays.

How to write your loop properly

If this looks over-complicated, then https://gcc.gnu.org/wiki/DontUseInlineAsm. Write a stand-alone function in asm so you can just learn asm instead of also having to learn about gcc and its complex but powerful inline-asm interface. You basically have to know asm and understand compilers to use it correctly (with the right constraints to prevent breakage when optimization is enabled).

Note the use of named operands like %[ptr] instead of %2 or %%ebx. Letting the compiler choose which registers to use is normally a good thing, but for x86 there are letters other than "r" you can use, like "=a" for rax/eax/ax/al specifically. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, and also other links in the inline-assembly tag wiki.

I also used buf_loop%=: to append a unique number to the label, so if the optimizer clones the function or inlines it multiple places, the file will still assemble.

Source + compiler asm output on the Godbolt compiler explorer.

void ext(char *);

int foo(void) 
{
    int howmany = 5046;   // could be a function arg
    char buffer[howmany];
    //ext(buffer);

    const char *bufptr = buffer;  // copy the pointer to a C var we can use as a read-write operand
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       :   [res]"=a"(result)      // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , [ptr] "+r" (bufptr)
       : // no input-only operands
       : "memory"   // we read memory that isn't an input operand, only pointed to by inputs
    );
    return result;
}

I used %%al as an example of how to write register names explicitly: Extended Asm (with operands) needs a double % to get a literal % in the asm output. You could also use %[res] or %0 and let the compiler substitute %al in its asm output. (And then you'd have no reason to use a specific-register constraint unless you wanted to take advantage of cbw or lodsb or something like that.) result is unsigned char, so the compiler will pick a byte register for it. If you want the low byte of a wider operand, you could use %b[count] for example.

This uses a "memory" clobber, which is inefficient. You don't need the compiler to spill everything to memory, only to make sure that the contents of buffer[] in memory matches the C abstract machine state. (This is not guaranteed by passing a pointer to it in a register).

gcc7.2 -O3 output:

    pushq   %rbp
    movl    $5046, %edx
    movq    %rsp, %rbp
    subq    $5056, %rsp
    movq    %rsp, %rcx         # compiler-emitted to satisfy our "+r" constraint for bufptr
    # start of the inline-asm block
    buf_loop18:  
       movb     (%rcx), %al 
       inc     %rcx        
       dec     %edx      
       jnz     buf_loop      
    # end of the inline-asm block

    movzbl  %al, %eax
    leave
    ret

Without a memory clobber or input constraint, leave appears before the inline asm block, releasing that stack memory before the inline asm uses the now-stale pointer. A signal-handler running at the wrong time would clobber it.

A more efficient way is to use a dummy memory operand which tells the compiler that the entire array is a read-only memory input to the asm statement. See get string length in inline GNU Assembler for more about this flexible-array-member trick for telling the compiler you read all of an array without specifying the length explicitly.

In C you can define a new type inside a cast, but you can't in C++, hence the using instead of a really complicated input operand.

int bar(unsigned howmany)
{
    //int howmany = 5046;
    char buffer[howmany];
    //ext(buffer);
    buffer[0] = 1;
    buffer[100] = 100;   // test whether we got the input constraints right

    //using input_t = const struct {char a[howmany];};  // requires a constant size
    using flexarray_t = const struct {char a; char x[];};
    const char *dummy;
    unsigned char result;
    asm("buf_loop%=:  \n\t"                 // do {
        "   movb     (%[ptr]), %%al \n\t"   // Copy buffer[x] to al
        "   inc     %[ptr]        \n\t"
        "   dec     %[count]      \n\t"
        "   jnz     buf_loop      \n\t"      // } while(ebx>0)
       : [res]"=a"(result)        // al = write-only output
         , [count] "+r" (howmany) // input/output operand, any register
         , "=r" (dummy)           // output operand in the same register as buffer input, so we can modify the register
       : [ptr] "2" (buffer)     // matching constraint for the dummy output
         , "m" (*(flexarray_t *) buffer)  // whole buffer as an input operand

           //, "m" (*buffer)        // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
       : // no clobbers
    );
    buffer[100] = 101;
    return result;
}

I also used a matching constraint so buffer could be an input directly, and the output operand in the same register means we can modify that register. We got the same effect in foo() by using const char *bufptr = buffer; and then using a read-write constraint to tell the compiler that the new value of that C variable is what we leave in the register. Either way we leave a value in a dead C variable that goes out of scope without being read, but the matching constraint way can be useful for macros where you don't want to modify the value of your input (and don't need the type of your input: int dummy would work fine, too.)

The buffer[100] = 100; and buffer[100] = 101; assignments are there to show that they both appear in the asm, instead of being merged across the inline-asm (which does happen if you leave out the "m" input operand). IDK why the buffer[100] = 101; isn't optimized away; it's dead so it should be. Also note that asm volatile doesn't block this reordering, so it's not an alternative to a "memory" clobber or using the right constraints.

To remain consistent and to make it easier to maintain wouldn't is be preferable to use `%b[res]` instead of `%%al` — Michael Petch, Nov 18 '17 at 06:41
@MichaelPetch: oh, yeah I left in `%%al` as an example of using explicit register names in extended asm (where you need a double `%%`). But I forgot to say *why* I was doing it. `result` is an `unsigned char` so I could just use `%[res]` with no modifier to expand to `%al`. — Peter Cordes, Nov 18 '17 at 06:43
Yes, no need for the `b` modifier (I didn't pay attention that your variable was the same width)since you defined res as a byte wide variable. — Michael Petch, Nov 18 '17 at 06:44
@MichaelPetch: thanks for the feedback, updated the answer with the paragraph I forgot to write earlier. :) — Peter Cordes, Nov 18 '17 at 06:49
Since we have been doing the whole `using flexarray_t = const struct {char a; char x[];};`(or equivalent for the memory operands) it has me curious if this is actually a violation of strict aliasing rules. I noticed in the recent past (pre 7.x) that GCC/G++ would warn when compiling with `-fstrict-aliasing` (or equivalent). Version 7+ doesn't. I'm not sure if that is by design or if its a case that should warn but no longer does. Yes, the elements are all characters so one would think it wouldn't have a problem aliasing but when I read the standard it seems (to me) ambiguous on this point. — Michael Petch, Nov 18 '17 at 07:14
@MichaelPetch: I actually had the same thought while writing this, that it's clearly a cast to a different type. The gcc manual shows a cast to a struct type for a fixed-size memory input. I think that means that usage is officially supported, because documented = supported. I think flexible array structs are similar enough... — Peter Cordes, Nov 18 '17 at 07:18
I know there are examples in the inline assembly docs regarding casting but I don't recall one specifically using a struct with a flexible member. I may have missed it though. Usually the casts involve arrays with the same element types. I do seem to recall the idea with such a struct cast on the mailing list, but I doubt that qualifies as documented. — Michael Petch, Nov 18 '17 at 07:34
In this case I'm thinking that `"m" (*(const char (*)[howmany]) buffer)` looks safer since no struct is being used and we are using `howmany` to define the size of the buffer in question. — Michael Petch, Nov 18 '17 at 07:48
@MichaelPetch: That doesn't work for VLAs; C/C++ types must have compile-time constant sizes. It's fine with `const howmany = 5046;` though. re: docs, I meant fixed-size, not flexible. But [the latest docs now have an array of unspecified size](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1): `"m" (*(const char (*)[]) p)` as an input to a `"repne scasb"` — Peter Cordes, Nov 18 '17 at 07:56
There is an example in the inline docs that doesn't use a VLA but it passes in an array with the number of elements. For instance this is in the documentation `"m" (*(const double (*)[n]) x)` where `n` isn't a compile time value. — Michael Petch, Nov 18 '17 at 07:59
The example I was thinking of is still [in the gcc6.4 manual](https://gcc.gnu.org/onlinedocs/gcc-6.4.0/gcc/Extended-Asm.html#Clobbers-1) `{"m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )}`. That mess appears to be a GNU C statement-expression. — Peter Cordes, Nov 18 '17 at 07:59
@MichaelPetch: You're right, I was hung up on putting it inside a `struct`, and that's the part that doesn't work. A pointer to an array instead of a `struct` is syntactically unusual, but appears to be perfect for this. — Peter Cordes, Nov 18 '17 at 08:01
(Update on the dummy memory-operand idea: [How can I indicate that the memory \*pointed\* to by an inline ASM argument may be used?](https://stackoverflow.com/q/56432259) is the canonical Q&A for this.) — Peter Cordes, Jun 04 '21 at 06:56

at&t asm inline c++ problem

My Code

My Problem

3 Answers3

How to write your loop properly

Linked