At the neovim project, we make use of some functions that are standard but not implemented on all target platforms. Notably, stpcpy
and soon also mempcpy
. Currently we're solving that by supplying and using our own x
variants of these functions.
An example:
char *xstpcpy(char *restrict dst, const char *restrict src)
FUNC_ATTR_NONNULL_RET FUNC_ATTR_WARN_UNUSED_RESULT FUNC_ATTR_NONNULL_ALL
{
const size_t len = strlen(src);
return (char *)memcpy(dst, src, len + 1) + len;
}
Yet, this still isn't entirely optimal, as some compilers, like gcc, know what the standard versions of these functions do and can produce better code when given enough context: gcc code for stpcpy builtin.
I've had in mind to put #ifdef
guards around them, only when they're not defined should they be supplied by us, and that we should start using the regular names (stpcpy
instead of xstpcpy
). But that would be a more invasive change at this point. My question is if I could otherwise inform gcc that xstpcpy
is exactly like stpcpy
?
P.S.: a related question: is there a flag, such as -std=c99
, which forces gcc/clang to emit a call to the standard function no matter what? I seem to remember such a thing, but can't find a reference to it now. If -std=c99
indeed disables builtin expansion, I would to know how to enable builtin expansion while keeping -std=c99
.
EDIT: Since everything seems to be a bit blurry, I've been trying some things. First of all, the code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
const char str[] = "we have stpcpy";
printf("static\n");
{
char p1[256];
char p2[256];
char *end1 = stpcpy(p1, str);
char *end2 = (stpcpy)(p2, str);
printf("static using stpcpy?\np1 = %s (end = %p)\np2 = %s (end = %p)\n",
p1, end1, p2, end2);
}
return 0;
}
The results (I'm on OSX, but godbolt indicates it's similar on linux):
Commandline: gcc-4.9 -O3 -save-temps -march=native stpcpy.c -o stpcpy
gcc 4.9 seems to issue a call to stpcpy_chk
in place of the stpcpy()
line, and regular _stpcpy
(the libc call) in place of the (stpcpy)()
line. I would have expected gcc to lower this into a mempcpy
as the stpcpy
builtin code in the gcc codebase made initially made me believe.
Commandline: clang -O3 -save-temps -march=native stpcpy.c -o stpcpy
(XCode clang 3.4)
Clang has more or less the behaviour I was expecting from gcc. It completely optimizes out the calls to stpcpy
. Creating asm like this:
leaq -258(%rbp), %rdx
movabsq $34182044572742432, %rax ## imm = 0x79706370747320
movq %rax, -265(%rbp)
movabsq $2334402142592329079, %rcx ## imm = 0x2065766168206577
movq %rcx, -272(%rbp)
Instead of calling _stpcpy
.
I wonder if I can make gcc-4.9 do what I want. Using godbolt with different versions I haven't been able to create similar code with gcc as what clang does.