Force clang to inline or use static call to `memmove()`

Question

The following question sure sounds like an XY problem, but trust me, it's not. It is related to the lengthy investigation performed in this question.

I have very good reason to believe that a solution to that question's issue (not for the minimal example there, but for my actual code) involves ensuring memmove() is not called through a shared library, but rather as a static call, straight from my code to memmove() without going through a PLT, or even better, if memmove()'s code is inlined inside my code.

However, I have failed to find a command-line switch or anything else to prevent memmove() being called dynamically (note the dynamic call is confirmed by disassembling the output, after a build using -O3.) Does such a switch exist? While I could grab some optimized memmove() code for my platform (such as this), I would rather avoid introducing some code which might be obsoleted for a future architecture, and I do not consider this good programming practice anyway.

I'm using clang with the version string Ubuntu clang version 12.0.0-3ubuntu1~21.04.1 on a Raspberry Pi 4. Output of uname -a is Linux rpi4 5.11.0-1017-raspi #18-Ubuntu SMP PREEMPT Mon Aug 23 07:34:31 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux.

Not really what you're asking for, but it's possible that the footnote at the end of [this answer](https://stackoverflow.com/questions/50808782/what-does-impossibility-to-return-arrays-actually-mean-in-c/50808867#50808867) might be of interest. — Steve Summit, Sep 15 '21 at 03:11

score 0 · Answer 1 · answered Sep 16 '21 at 02:31

While I could grab some optimized memmove() code for my platform (such as this), I would rather avoid introducing some code which might be obsoleted for a future architecture

Your best bet is to write a straight-forward inline C or C++ implementation of memmove, and depend on the compiler to inline and optimize it.

Surprisingly doing this often beats hand-coded assembly implementations, because the compiler can often tell that the regions are non-overlapping, or that a whole multiple of 64-bit words is being used, and thus various parts of the generic memmove can be optimized out.

And doing this in C or C++ guarantees that it will continue to work on future architectures (so long as you can compile for them).

Force clang to inline or use static call to `memmove()`

1 Answers1