4

GCC (all versions I can conveniently test) can be told that an inline assembly statement reads a particular region of memory (expressed as a pointer p and a size n) with this admittedly awkward construct:

asm ("..." : : "m" (*(struct { char x[n]; } *)p));

However, this does not work in clang (3.[45]), you get a hard error:

 error: fields must have a constant size: 'variable
      length array in structure' extension will never be supported
 asm ("..." : : "m" (*(struct {char x[n];} *)p));
                                    ^

Is there (ideally) a different construct which will produce the same effect in both compilers, or (failing that) a different construct which will produce the same effect in clang, only?

Note that in the case I care about, I insert no actual assembly instructions; the point of the construct is to direct the compiler not to delete an apparently-dead memset. Thus, the "different construct" could perfectly well not involve inline assembly at all. However, please suggest constructs which read arbitrary memory, or generate additional code, only if there is no alternative. Also, DO NOT suggest memset_s, explicit_bzero, or similar; this is an attempt to implement a fallback for those functions without having to hack the compiler.

Full-scale demo program follows --

#include <string.h>

extern void foo(const char *a, const char *b, const char *c, char *d);

void bar(const char *x, char *y, size_t n)
{
  char w[16];
  char v[n];
  memset(w, 0x11, n);
  memset(v, 0x22, n);

  foo(w, v, x, y);

  memset(w, 0, 16);
  memset(v, 0, n);
  asm ("" : : "m" (*(struct {char _[n];} *)v));
}

-- as compiled by gcc 5.0 at -O2 -S, x86-64, CFI goo elided --

bar:
        pushq   %rbp
        leaq    15(%rdx), %rax
        movq    %rsp, %rbp
        pushq   %r14
        andq    $-16, %rax
        movq    %rsi, %r14
        pushq   %r13
        movl    $17, %esi
        movq    %rdi, %r13
        pushq   %r12
        leaq    -48(%rbp), %rdi
        pushq   %rbx
        movq    %rdx, %rbx
        subq    $16, %rsp
        subq    %rax, %rsp
        call    memset
        movq    %rbx, %rdx
        movq    %rsp, %rdi
        movl    $34, %esi
        call    memset
        movq    %r14, %rcx
        movq    %r13, %rdx
        movq    %rsp, %rsi
        leaq    -48(%rbp), %rdi
        call    foo
        movq    %rbx, %rdx
        xorl    %esi, %esi
        movq    %rsp, %rdi
        call    memset
        leaq    -32(%rbp), %rsp
        popq    %rbx
        popq    %r12
        popq    %r13
        popq    %r14
        popq    %rbp
        ret

-- the goal is to get the same number of block memory fills out of clang. Two is wrong, but four is also wrong.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • It is not undocumentd for gcc. Not sure about clang, however. If the `memset` is unused, there is definively no reason to keep. What you are actually looking for is to use _barriers_ (aka _fences_). Look into `stdatomic.h` (since C11) for a standard way. Note that for a barrier, you just clobber `memory`, but that will still not include hardware barriers which handle caches/buffers correctly. – too honest for this site Jul 25 '15 at 16:37
  • @Olaf Where did you see this documented? I honestly cannot remember where I learned it. – zwol Jul 25 '15 at 16:40
  • In the [manual](https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/Extended-Asm.html#Extended-Asm)? The version for [gcc 5.2](https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/Extended-Asm.html#Extended-Asm) got the long pending overhaul (not sure, though, if the assembler interface is still the same). I had to find that on my onw, so not sure if this is ever tought in lessons. Still note this might not work as expected due to caches, write-buffers, concurrency, etc. You really should check atomics. – too honest for this site Jul 25 '15 at 16:44
  • The gcc reference still requires the size of the memory region to be known *at compile time*, so you are relying on an unsupported, or at least undocumented, feature of gcc. – Brett Hale Jul 25 '15 at 21:18
  • Since you're trying to implement a "fallback" why not implement your own version of memset using volatile pointers? – Ross Ridge Jul 25 '15 at 23:02
  • I would be cautious using that gcc construct for attempting to clobber memory. While the docs still include it, my experience is that it not only [doesn't work](https://gcc.gnu.org/ml/gcc/2014-09/msg00342.html), it has unexpected (and undesirable) side effects. How about `char buff[16] = "asdffdsajab"; char *p = buff; printf("%s\n", p); memset(buff, 0, 10); asm("#" : : "m" (*p));`? – David Wohlferd Jul 26 '15 at 00:15
  • @DavidWohlferd I'm sorry, could you be more specific? gcc 5.1.1 generates code that is not wrong, for your example (if I'm reading it correctly). – zwol Jul 26 '15 at 01:35
  • The code I posted above doesn't use the 'size trick,' which (as I mentioned) doesn't work correctly. So I would expect it to work correctly. Or do you mean that 5.1.1 correctly generates code for the example in the link I provided? – David Wohlferd Jul 26 '15 at 02:16
  • @DavidWohlferd The code in the link you provided is incorrect. It's annotated to *read* memory, but the assembly language is *writing* memory. (The manual is indeed misleading. An input is never a clobber.) I get correct assembly output (for any value of `-DPAD`) from gcc-4.9 with the modified code here: http://pastebin.com/QSSC8JaX – zwol Jul 27 '15 at 20:25
  • @DavidWohlferd (It could be _better_ code -- it fails to notice that the initial stores to `c.a` and `c.b` are dead. But it correctly reloads their values from memory in between the `asm` and the `printf`.) – zwol Jul 27 '15 at 20:27
  • @RossRidge (I'm sorry, I missed your comment until now) Use of volatile pointers would inhibit desirable optimizations. For instance, in the "full-scale demo program", if I had applied the memory-use construct to `w` instead of `v`, I would still want the compiler to replace the `memset` with two `movq $0, [address]` instructions. – zwol Aug 08 '15 at 21:42
  • Since it's a fallback I'd be more concerned about getting something that works than something that's optimal. In any case you can write your own memset that uses 64-bit volatile stores that when inlined can result in a `movq $0, [address]` instruction being emitted. Though the performance difference isn't likely to be as big a you might expect. – Ross Ridge Aug 08 '15 at 23:08
  • @PeterCordes Your answer, in the question you marked this as a duplicate of, only discusses GCC. This question was specifically about clang/LLVM and explains how a technique for memory access in `asm` that works with GCC does _not_ work with LLVM. Please either investigate whether your (slightly different) technique also works with LLVM and add that to your answer over there, or unmark this as a duplicate. – zwol Jun 04 '19 at 12:24
  • @zwol: That was already on my TODO list, but I think I have tried this on clang in the past and found it does work, unlike the struct with flexible array member. – Peter Cordes Jun 04 '19 at 17:39

0 Answers0