2

I am re-learning assembler which I used on very old MS-DOS machines!!!

This is my understanding of what that function should look like. It compiles but crashes with a SIGSEGV when trying to put 0xffffffff in ecx.

The code is run in a VM with 32-bit Debian 9. Any help would be appreciated.

    int getStringLength(const char *pStr){

        int len = 0;
        char *Ptr = pStr;

        __asm__  (
            "movl %1, %%edi\n\t"
            "xor %%al, %%al\n\t"
            "movl 0xffffffff, %%ecx\n\t"
            "repne scasb\n\t"
            "subl %%ecx,%%eax\n\t"
            "movl %%eax,%0"
            :"=r" (len)     /*Output*/
            :"r"(len)       /*Input*/
            :"%eax"         /*Clobbered register*/


    );

        return len;
    }
jeteon
  • 3,471
  • 27
  • 40
  • There are a few problems with the `__asm__` statement. You've assigned `"r"` for input and output, which means the compiler is free to choose the register, which doesn't tend to mix well with explicit register use. The input argument is *not* `len`, it should be `pStr`. `%edi` and `%ecx` are also clobbered, but the compiler has not been informed of this. Recommend [this](https://locklessinc.com/articles/gcc_asm/) – Brett Hale Aug 12 '17 at 14:53
  • Thank you for the pointers Brett, (no pun intended) it was 3am when I wrote that, guess I was a little tired ;-) – Patrick Regnouf Aug 12 '17 at 16:05

2 Answers2

6

The problem with using GCC's inline asm to learn assembly is that you spend half your time learning about how gcc's inline assembly works instead of actually learning assembly. For example here's how I might write this same code:

#include <stdio.h>

int getStringLength(const char *pStr){

    int len;

    __asm__  (
        "repne scasb\n\t"
        "not %%ecx\n\t"
        "dec %%ecx"
        :"=c" (len), "+D"(pStr)     /*Outputs*/
        :"c"(-1), "a"(0)            /*Inputs*/
        /* tell the compiler we read the memory pointed to by pStr,
           with a dummy input so we don't need a "memory" clobber */
        , "m" (*(const struct {char a; char x[];} *) pStr)

    );

    return len;
}

See the compiler's asm output on the Godbolt compiler explorer. The dummy memory-input is the tricky part: see discussion in comments and on the gcc mailing list for the most optimal way to do this which is still safe.

Comparing this with your example

  1. I don't initialize len, since the asm declares it as an output (=c).
  2. There's no need to copy pStr since it is a local variable. By spec, we're already allowed to change it (although since it is const we shouldn't modified the data it points to).
  3. There's no reason to tell the inline asm to put Ptr in eax, only to have your asm move it to edi. I just put the value in edi in the first place. Note that since the value in edi is changing, we can't just declare it as an 'input' (by spec, inline asm must not change the value of inputs). Changing it to a read/write output solves this problem.
  4. There's no need to have the asm zero eax, since you can have the constraints do it for you. As a side benefit, gcc will 'know' that it has 0 in the eax register, and (in optimized builds) it can re-use it (think: checking the length of 2 strings).
  5. I can use the constraints to initialize ecx too. As mentioned, changing the value of inputs is not allowed. But since I define ecx as an output, gcc already knows that I'm changing it.
  6. Since the contents of ecx, eax and edi are all explicitly specified, there's no need to clobber anything anymore.

All of which makes for (slightly) shorter and more efficient code.

But this is ridiculous. How the heck (can I say 'heck' on SO?) are you supposed to know all that?

If the goal is to learn asm, using inline asm is not your best approach (in fact I'd say that inline asm is a bad idea in most cases). I'd recommend that you declare getStringLength as an extern and write it completely in asm then link it with your C code.

That way you learn about parameter passing, return values, preserving registers (along with learning which registers must be preserved and which you can safely use as scratch), stack frames, how to link asm with C, etc, etc, etc. All of which is more useful to know than this gobbledygook for inline asm.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
David Wohlferd
  • 7,110
  • 2
  • 29
  • 56
  • Until recently i would have said this looks good but there is an issue wth this that will make it fail on GCC although it seems CLANG will work. With optimizations and code written a certain way. – Michael Petch Aug 13 '17 at 01:17
  • And the recent observation I learned about was in a question asked by someone else and I'm unsure if it was design intent or is an actual bug in GCC. I just never inquired to the GCC mailing list to figure that out. – Michael Petch Aug 13 '17 at 01:18
  • Want to know where the point of failure is? PS: This bug or oddity has me questioning how much bad code I have on SO lol. I I don't feel bad now that you've posted this. Because it makes me feel less stupid lol – Michael Petch Aug 13 '17 at 01:20
  • @MichaelPetch Well, I didn't clobber the flags, but I'm pretty sure that's not what you're referring to. And I didn't clobber memory, but I shouldn't have to since this is a function call, and gcc should implicitly do that for me. Usually what I miss is the early-clobber, but that doesn't apply here since I'm explicitly naming all the regs. There's `cld`, but agner tells me I can assume this. Ok, I'm not seeing it. What other bit of weirdness does someone have to know to use inline asm? – David Wohlferd Aug 13 '17 at 01:47
  • Here is test code to try `int main() { char test[]="12345678"; return getStringLength(test); }` This creates the string on the stack but with optimizations on (I think O1 and above) the string isn't necessarily dumped to the stack before the inline assembly template is called. You may not end up with the result you wanted. – Michael Petch Aug 13 '17 at 02:00
  • 1
    To be frank I think the only guaranteed way I have found is to actually add a `"memory"` clobber. The size trick from GCC manual (using a struct) works but you can't use a large struct size or you end up with the struct taking too much space on the stack. – Michael Petch Aug 13 '17 at 02:12
  • CLANG on the other hand seems to handle this properly (at least the versions I have tested) – Michael Petch Aug 13 '17 at 02:13
  • Personally my expectation is that if the optimizer inlines a function and sees a pointer passed into a template - that it has the knowledge to know the length of the character array and should be able to ensure the array is actually in memory before executing the inline assembly. `memory` clobber is such an inefficient hack because it will ensure everything is flushed to memory not just the array. – Michael Petch Aug 13 '17 at 02:18
  • Hmm. I'll have to try this. I would have expected this to be a problem if getStringLength were a macro (or if the `asm` was directly included in `main`), but it was my understanding that function calls imply a memory clobber. For example, what if getStringLength was an actual asm routine? Everything MUST be clobbered then. But as you say, perhaps there's an issue with inlining. – David Wohlferd Aug 13 '17 at 02:19
  • The issue seems to be specifically to inlining. – Michael Petch Aug 13 '17 at 02:20
  • The question and answer that had lead me to this behavior was this one: https://stackoverflow.com/questions/2958259/syscall-from-within-gcc-inline-assembly/45535396#45535396 . – Michael Petch Aug 13 '17 at 02:21
  • @David. When the compiler generates a call to an external function (including one written in assembly), it must ensure that accessible data isn't cached in registers. For static or inline functions, this isn't required, as long as the compiler does "the right thing". Since the compiler (in theory) knows what the asm code is doing (via constraints and clobbers), it can optimize based on that knowledge. If the constraints or clobbers are incorrect, the optimizations may be also. – prl Aug 13 '17 at 04:07
  • 1
    ... Unfortunately, a memory clobber or an operand with a memory constraint are the only ways to tell the compiler that something needs to be visible in memory. Trying to use a memory constraint with an array (especially a variable-length string) is ugly at best, leaving us with using a memory clobber. – prl Aug 13 '17 at 04:11
  • @prl It has always been my understanding that function calls perform an implicit clobber (whether all of "memory" or just escaped pointers I've never known for sure). Today I discovered that this is only true if the compiler happens to decide NOT to inline the function. Since you can never know when that might or might not happen, you (apparently) must ALWAYS use a memory clobber when passing memory pointers to inline asm. If true, this fact is not widely known, even among experts in inline asm. – David Wohlferd Aug 13 '17 at 04:49
  • @David Wohlferd Thank you for so much input, as I mentioned I am not learning ASM, I am re-learning as it's been a long time ... and I need to inline it in order to optimise C code. so I need the gobbledigook am afraid!!! – Patrick Regnouf Aug 14 '17 at 09:41
  • 1
    Well, hopefully this answer helps with that too (that's why I explained each of the changes). I will say that the need to use inline asm for optimization has decreased a lot over the years. Between optimizers getting better, and builtins for accessing low level cpu capabilities, it's hard to justify using it anymore. Doesn't mean it isn't cool and interesting, but for production code, the benefit is seldom worth the cost. – David Wohlferd Aug 14 '17 at 20:53
  • @PatrickRegnouf: If you want `strlen` to run fast, you probably shouldn't be using `rep scasb` because it's not particularly fast on most CPUs. glibc uses an SSE2 strlen implementation. See https://stackoverflow.com/questions/37800739/is-it-safe-to-read-past-the-end-of-a-buffer-within-the-same-page-on-x86-and-x64/37801080 and https://stackoverflow.com/questions/40915243/find-the-first-instance-of-a-character-using-simd for links to (and my comments on) glibc's strlen implementations. – Peter Cordes Aug 19 '17 at 16:51
  • @DavidWohlferd: gcc only assumes that escaped pointers could be used by a non-inline function call. And probably with LTO / whole-program optimization, it can even check what a function does and make assumptions even without inlining it. If you want to pass pointers in registers into an `asm` statement, you definitely need to pass a dummy memory in or out operand as well if the asm dereferences the pointer (or clobber `"memory"`). You just use the first element of an array as a memory operand; the compiler might not keep later elements alive in registers, but IDK and wouldn't count on it. – Peter Cordes Aug 19 '17 at 16:59
  • @peter - but as Michael points out above, even the memory in operand doesn't work. You need a cast to volatile or something like that. – BeeOnRope Aug 19 '17 at 18:11
  • @BeeOnRope: that sounds like a compiler bug (which may very well exist and need to be worked around). Or maybe a strict-aliasing violation led to the compiler not putting what Michael hoped into a memory operand? I almost never actually use inline asm, because it's usually possible to get good asm by hand-holding the compiler with pure C. – Peter Cordes Aug 19 '17 at 18:14
  • 1
    @PeterCordes yup, I think it _might_ be a bug. Here's the [report](https://gcc.gnu.org/ml/gcc/2017-08/msg00116.html). I don't see any strict aliasing violation. The example in the bug isn't very good because it uses `"+D"(pStr)` which as you point out doesn't claim to read the memory pointed to by `pStr`, but examples with `*pStr` also fail (in fact, the compiler _does_ write the first 16 bytes of the string to memory before calling the asm, but not the remaining bytes). [Discussion](https://chat.stackoverflow.com/rooms/151218/discussion-between-dato-and-michael-petch). – BeeOnRope Aug 19 '17 at 19:10
  • The reason I say _might_ is because it isn't clear if something like `"m"(*p)` where `p` is say `char *` means "will access at most the single `char` at p (i.e., `p[0]`) " or rather "will access the entire memory region pointed at by `p`". Of course the latter interpretation doesn't make sense in the context of a compiled standalone function like `getStringLength`, but the distinction isn't important then anyways, since the the caller has to write everything that `p` points to prior to escaping the pointer to a called function. It matters in the inlined case... – BeeOnRope Aug 19 '17 at 19:13
  • 1
    @BeeOnRope: Thanks for the link. Looks like the only thing that wasn't working was just `"m"(*pStr)`, which only tells gcc that the first character is needed, so it doesn't think `pstr[4] = 0` will affect the asm. (So my caution about that idea was justified.) `"m" (*(const char (*)[]) pStr)` apparently does work, which is very cool. It apparently means that an arbitrary-length input accessed through `pStr` is read by the asm. – Peter Cordes Aug 19 '17 at 19:28
  • IDK where the expectation came from that function calls made it safe to dereference pointers inside `asm`. Judging from SO questions and answers, a lot of inline-asm code is just plain unsafe in general terms, but happens to work fine because the code that writes the input memory is in a loop and the compiler happens not to reorder the whole loop with the `asm` after inlining. (Or it doesn't get inlined). Seeing something in existing code doesn't mean it's safe, just that it happened not to break in that codebase. – Peter Cordes Aug 19 '17 at 19:30
  • @PeterCordes - that syntax does seem to force `gcc` to write out `pstr` to memory on gcc and works (unlike the `volatile` suggestion Michael makes above, which based on my testing doesn't work). Unfortunately, it doesn't compile at all on `clang`, giving: _error: dereference of pointer to incomplete type 'const char []'_. Now `clang` doesn't exhibit the original problem, but it isn't clear if it's because it actually uses more conservative semantics for memory/pointer arguments to inline functions... – BeeOnRope Aug 19 '17 at 19:36
  • ... or simply because it chooses not to split up the copy of the string constant to the stack (gcc seems to like to reorder memory reads and writes more aggressively with surrounding code, while `clang` usually seems to keep them all contiguous). So it is entirely possible that `clang` simply works by luck but that the same bug exists there. – BeeOnRope Aug 19 '17 at 19:37
  • 2
    @BeeOnRope: It looks like clang is just more conservative with `"m"(*pStr)`. If you make `buff` a global and put`buff[4]=0` ahead of the asm and `buff[4]=1` after, then `"m"(*pStr)` stops clang (but not gcc) from optimizing away the `=0` as a dead store. https://godbolt.org/g/F8BM22 `"m" (*(const struct {char a; char x[];} *) pStr)` works for both (and ICC, although ICC is very conservative and doesn't see the store as dead even when there's no `"m"` operand at all.) – Peter Cordes Aug 19 '17 at 19:58
  • @PeterCordes indeed. It does seem like a bit of a time-bomb waiting to blow up when a function gets inlined or different optimization decisions are made (and the extended asm documentation isn't entirely clear about the semantics of `m` as an input argument when it comes to arrays/regions of size > 1). There is in fact an example similar to yours in the [doc](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html), see "You can use a trick to avoid this if the size of the memory being accessed is known at compile time" - but it only applies to compile-time known array sizes. – BeeOnRope Aug 19 '17 at 20:13
  • @BeeOnRope: Yeah, I knew it was straightforward for compile-time-constant sizes, because of that doc example. Seemed very clear to me, although I guess I'm lucky that I wasn't trying to read the docs before they at least explained that part clearly, so I didn't have a misconception from when the docs weren't clear. I didn't know you'd get anything useful out of the `char x[]` at the end of a struct trick! – Peter Cordes Aug 19 '17 at 20:23
  • @PeterCordes - the docs are unclear on the point that accessing memory pointed to by a pointer `p` doesn't work except for `p[0]`. Do you find any part where that giant caveat is clearly explained? Even in the `gcc` mailing list, hardcode gcc dev Andrew P seems to think that using `"m" (*p)` should work. – BeeOnRope Aug 19 '17 at 20:28
  • 1
    @BeeOnRope: Oh right, that's fair. I wasn't sure either, and guessed that it might work in practice in a lot of cases. The example for constant-size definitely implies that you need to tell the compiler about the entire range of bytes you access, though, not just one element of it. – Peter Cordes Aug 19 '17 at 20:34
  • Yes, that's true. I suppose you can imply the behavior from that `struct` trick example (on the other hand, the doc there is too conservative: it implies you need memory clobber (outside of using the trick) if you access memory pointed to by input args _at all_, but it seems that is not true except for arrays (i.e., beyond `p[0]`) - for the first/only element you can use the `"m" (*p)` syntax without issue. – BeeOnRope Aug 19 '17 at 20:36
  • "IDK where the expectation came from that function calls made it safe" - I had the same expectation and I don't know where I got it either, but it's the key to this question. If it's NOT safe, I want someone to SAY that, in an unambiguous way, in a public forum. That way when I start pointing this out to (the many) people that are using it wrong, I'll have a credible source I can reference. Since I never really got a definitive answer on the ML (it wandered off to the "+m" question), I may try again. – David Wohlferd Aug 19 '17 at 22:17
  • "IDK where the expectation came from that function calls made it safe" - Take 2: The expectation that you can modify parameter values is inherent in the definition of what it means to be "in a function." Even if the code ends up getting inlined, the assumption is this paradigm will still apply. Similarly, it doesn't seem like such a stretch that "flushing pointers" is part of the definition of calling a function, and so should apply even if inlined. How could you call a DLL or even an asm routine if flushing both parameters and globals weren't implicit in the definition of calling a function? – David Wohlferd Aug 20 '17 at 09:02
  • (Just saw the above comments after looking for this answer again). Ok, yes, I can understand that reasoning, and it makes sense people would think that way. The flaw in the reasoning is that it's based on what happens with *opaque* functions where the compiler can't see the internals and has to assume it might do *anything* the calling convention allows. Unlike the case with tiny wrapper functions which we want to be as lightweight as type-safe CPP macros. It would suck if function wrappers for inline-asm statements meant the compiler had to assume all the read-only input regs were dead. – Peter Cordes Jan 21 '18 at 05:43
  • opaque functions - Aren't inline asm blocks inherently opaque? Which means that any function which contains inline asm would by definition also be opaque. inline asm already disables certain optimizations (maybe even this one!). – David Wohlferd Jan 21 '18 at 21:32
  • @PeterCordes "read-only input regs were dead" - I don't want them dead (or even injured). I'm saying that if someone wants to write a string function (say strlen) and takes a ptr as "r" (which they all do), the memory pointed to by ptr must be valid (aka flushed) since that's how opaque functions (like those that contain inline asm) work by definition. Alternately the docs need to SAY that it doesn't work, since everyone (including op, see his answer) assumes it will. – David Wohlferd Jan 21 '18 at 21:33
  • @DavidWohlferd: I was getting mixed up with a recent question which clobbered input regs; I guess we weren't talking about that in this thread. Yes, inline asm itself is opaque, but the relevant calling convention is what the constraints describe, *not* the calling convention of the function wrapper around it, because that inlines away, so it makes sense that it has no effect on whether memory is flushed. Anyway, I agree the pointed-to-memory issue is surprising and needs to be well documented. People expect they can deref pointers they from inline asm regardless of wrapper function or not. – Peter Cordes Jan 22 '18 at 03:40
-1

Finally got it working:

    int getStringLength(const char *pStr){

        int len = 0;
        const char *Ptr = pStr;

        __asm__  (
            "mov %%eax, %%edi\n\t"
            "xor %%eax, %%eax\n\t"
            "mov $0xffffffff, %%ecx\n\t"
            "repne scasb\n\t"
            "sub %%ecx,%%eax\n\t"
            "sub $2, %%eax\n\t"
            :"=a" (len)        /*Output*/
            :"a"(Ptr)          /*Input*/
            :"%ecx", "%edi"    /*Clobbered register (ecx, edi)*/
        );

        return len;
    }
prl
  • 11,716
  • 2
  • 13
  • 31
  • 1
    This is unsafe: you don't tell the compiler you're reading the memory pointed to by `Ptr`. An assignment like `pStr[10] = 0` can reorder with the asm statement at compile time. See David's answer. (This is not well documented, unfortunately). It's also inefficient: use a `"D" (Ptr)` constraints to ask for `Ptr` in `%edi` in the first place. Using `eax` just makes you waste a `mov` instruction. – Peter Cordes Jan 22 '18 at 03:46