37

I am trying to write some bare metal code with a memset-style loop in it:

for (int i = 0; i < N; ++i) {
  arr[i] = 0;
}

It is compiled with GCC and GCC is smart enough to turn that into a call to memset(). Unfortunately because it's bare metal I have no memset() (normally in libc) so I get a link error.

 undefined reference to `memset'

It seems like the optimisation that does this transformation is -ftree-loop-distribute-patterns:

Perform loop distribution of patterns that can be code generated with calls to a library. This flag is enabled by default at -O2 and higher, and by -fprofile-use and -fauto-profile.

So one person's solution was to just lower the optimisation level. Not very satisfying.

I also found this really helpful page that explains that -ffreestanding is not enough to get GCC not to do this, and there's basically no option but to provide your own implementations of memcpy, memmove, memset and memcmp. I'm happy to do that, but how?

If I just write memset the compiler will detect the loop inside it and transform it into a call to memset! In fact in the code provided by the CPU vendor I'm using I actually found this comment:

/*
// This is commented out because the assembly code that the compiler generates appears to be
// wrong.  The code would recursively call the memset function and eventually overruns the
// stack space.
void * memset(void *dest, int ch, size_t count)
...

So I assume that is the issue they ran into.

How do I supply a C implementation of memset without the compiler optimising it to a call to itself and without disabling that optimisation?

Timmmm
  • 88,195
  • 71
  • 364
  • 509
  • Impossible to answer without knowing the target system. – Lundin Apr 22 '21 at 09:30
  • 2
    Does this answer your question? https://stackoverflow.com/questions/2548486/compiling-without-libc – YSC Apr 22 '21 at 09:32
  • And/or this https://stackoverflow.com/questions/37250187/barebones-c-without-standard-library – YSC Apr 22 '21 at 09:33
  • Yeah and since that also removes the CRT we need to know the target system. On some mid-range systems like Cortex M you can roll out your own CRT or grab one from the net. If it's a CPU with MMU then things turn more hairy, still possible but much more intricate. – Lundin Apr 22 '21 at 09:35
  • Do your case requires code portability ? If not, why not use the `asm` directive to produce your own memset implementation ? – Zilog80 Apr 22 '21 at 09:51
  • 2
    Weird, `-ffreestanding` [works for me](https://godbolt.org/z/oofE4qjhP). – ssbssa Apr 22 '21 at 10:37
  • @YSC No those are not the same question. – Timmmm Apr 22 '21 at 10:54
  • 1
    @Lundin: I don't see why you need to know the target system. I think you misunderstood. – Timmmm Apr 22 '21 at 10:55
  • @ssbssa: That *is* weird. Though if you read the last paragraph of the `Compiler option -ffreestanding` section in [this page](http://cs107e.github.io/guides/gcc/) it sounds like `-ffreestanding` will still use `memset()` in some other situations. – Timmmm Apr 22 '21 at 10:56
  • Though I tried a few things like copying large structs or initialising large arrays and couldn't get it to use `memset`/`memcpy` even withouth `-ffreestanding`. ‍♂️ – Timmmm Apr 22 '21 at 11:04
  • The questions may differ but still provide insight. I'd give it a go nonetheless. – YSC Apr 22 '21 at 11:04
  • To compiler to optimize it N may need to be tested with much bigger sizes because when the N is small, having a memset there may actually slow the code. This optimization maybe possible on compiletime when N could be guessed bigger perhaps preferably bigger than the cache. On the other hand, such optimizations can be forced, LLVM intrinsics does that actually. Maybe GCC has that no clue, – Abdurrahim Apr 22 '21 at 14:01
  • @Timmmm Because different targets use different C libs, different CRT, different ABI. For example there are two different ports of gcc for ARM, one targeting hosted systems and one targeting freestanding ones. In addition, there is `-nostdlib` for no standard lib and there's `-ffreestanding` for embedded systems. At what extent you need to provide these compiler options might depend on the target port. Most importantly, bare metal systems with MMU are quite different from bare metal systems without one. – Lundin Apr 22 '21 at 15:39
  • You can just compile just those functions without optimization. – Emanuel P Apr 22 '21 at 17:18
  • 4
    @EricPostpischil It turns out the compiler really inserts `memset` and `memcpy` calls even with `-ffreestanding`, when initializing/copying [large structs](https://godbolt.org/z/az7dfEe7n). This is also mentioned in the [gcc documentation](https://gcc.gnu.org/onlinedocs/gcc-10.3.0/gcc/Standards.html#index-ffreestanding): `GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp.` – ssbssa Apr 22 '21 at 17:36
  • Doesn't GCC always require you to use the libgcc library? Does that define memset etc? – user253751 Apr 22 '21 at 17:39
  • @ssbssa: Thanks for that, particularly the link to documentation. That could be worth posting as a posed question and answer, to record it for future users. – Eric Postpischil Apr 22 '21 at 17:39
  • @ssbssa: In what cases would gcc guarantee that it wouldn't replace a sequence of operations with a call to `memcpy` or `memmove`? If one tries to define a simple straightforward definition of `memcpy` function, even if one names it `memcpy`, gcc is prone to replace the supplied code with a recursive call to memcpy. While `-ffreestanding` seems to disable that, is there anything that would specify when a compiler might rely upon the existence of `memcpy` or `memmove` even with that flag? – supercat Apr 22 '21 at 18:18
  • 1
    @supercat I think the answer provided here is the "correct" way, since glibc is doing it like this (and AFAIK they work hand in hand with gcc). – ssbssa Apr 22 '21 at 18:56
  • @user253751: It does require it (and builds it for you), but no, it does not provide `memset` and friends. That's also explained in ssbssa's link. – Nate Eldredge Apr 22 '21 at 21:16
  • 1
    "provide your own implementations of memcpy, memmove, memset and memcmp. I'm happy to do that, but how?" They're probably assuming you will write them in assembly. Which you'll probably want to do anyway, if performance is any kind of consideration. – Nate Eldredge Apr 22 '21 at 21:18
  • @NateEldredge Nope. Glibc doesn't implement them in assembly. – Timmmm Apr 24 '21 at 12:47
  • 1
    @Timmmm: Yes it does, for most common platforms. For example when I call `memcpy` on my glibc-based x86-64 system, what runs is [this code](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S;h=5e4a071f16264e77a99792323d59a34b11bb9626;hb=HEAD). The C function you found, with its inhibited optimization, is a fallback in case nobody has written an assembly version for some obscure system, but it isn't meant to be used in the mainstream. – Nate Eldredge Apr 24 '21 at 16:24
  • Ah I stand corrected. Still I don't think they expect you to definitely implement it in assembly. – Timmmm Apr 24 '21 at 16:27

2 Answers2

31

Aha I checked in the glibc code and there's a inhibit_loop_to_libcall modifier which sounds like it should do this. It is defined like this:

/* Add the compiler optimization to inhibit loop transformation to library
   calls.  This is used to avoid recursive calls in memset and memmove
   default implementations.  */
#ifdef HAVE_CC_INHIBIT_LOOP_TO_LIBCALL
# define inhibit_loop_to_libcall \
    __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
#else
# define inhibit_loop_to_libcall
#endif
Timmmm
  • 88,195
  • 71
  • 364
  • 509
  • No, you aren't likely able to run glibc on bare metal systems. I think other attempts to more suitable libs for Linux-flavoured systems are out there, such as [uClibc](https://en.wikipedia.org/wiki/UClibc). I haven't used it myself so I can't vouch for it. – Lundin Apr 22 '21 at 09:41
  • 7
    @Lundin hem. This *is* the solution, i.e. `__attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))` modifier to Timmmm's *own* definition of `memset` etc. – Antti Haapala -- Слава Україні Apr 22 '21 at 09:59
  • @AnttiHaapala That depends on the gcc target port. Also if you are rolling out your own std lib replacement you shouldn't be using another std lib at the same time, obviously... – Lundin Apr 22 '21 at 10:12
  • 6
    I'm not using any libc implementation. I think you might have misunderstood the problem. – Timmmm Apr 22 '21 at 10:52
  • 1
    @Lundin, the point here is that since `glibc` provides an implementation of `memset`, it needs to have some method of suppressing GCC's "replace this code with a call to `memset`" optimization, without reducing the global optimization level. – Mark Apr 23 '21 at 00:39
3

You mention in your question:

It seems like the optimisation that does this transformation is -ftree-loop-distribute-patterns

all you need to do to turn off this optimization is pass -fno-tree-loop-distribute-patterns to the compiler. This turns off the optimization globally.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
  • This is strictly worse than my answer, which just turns it off for `memset`. – Timmmm Apr 24 '21 at 12:44
  • 1
    @Timmmm This is the answer you want rather than the answer you think you want; assuming your `memset` won't be any different than the naive C implementation, it could be slower than having the compiler simply generate a loop inside the code. The compiler really only generates calls to `memset` because it assumes (and is correct in most cases) that `memset` is faster than anything it could generate itself. – S.S. Anne Apr 25 '21 at 14:50
  • That is a good point, but your solution still won't work in general because the compiler can still generate calls to `memset()` even with this flag. See [ssbssa's example](https://godbolt.org/z/az7dfEe7n). – Timmmm May 04 '21 at 09:05
  • @Timm your answer uses the same flag so that's a moot point – S.S. Anne May 05 '21 at 22:26
  • 1
    Mine only uses the flag for specific functions which don't do anything else to trigger GCC's insertion of calls to `memset()`/`memcpy()`. – Timmmm May 06 '21 at 08:58
  • ah, I see. I wonder why GCC would disregard the flag. Maybe it'd be a good idea for someone to file a bug. – S.S. Anne May 07 '21 at 19:40
  • It's not a bug - the flag only turns off one of the optimisations that leads to calls to `memset`. Struct copying is a different optimisation. Arguably there should be a `-fdont-implicitly-call-mem-functions` flag but I guess nobody needed it badly enough yet. – Timmmm May 09 '21 at 08:46