2

At the neovim project, we make use of some functions that are standard but not implemented on all target platforms. Notably, stpcpy and soon also mempcpy. Currently we're solving that by supplying and using our own x variants of these functions.

An example:

char *xstpcpy(char *restrict dst, const char *restrict src)
  FUNC_ATTR_NONNULL_RET FUNC_ATTR_WARN_UNUSED_RESULT FUNC_ATTR_NONNULL_ALL
{
  const size_t len = strlen(src);
  return (char *)memcpy(dst, src, len + 1) + len;
}

Yet, this still isn't entirely optimal, as some compilers, like gcc, know what the standard versions of these functions do and can produce better code when given enough context: gcc code for stpcpy builtin.

I've had in mind to put #ifdef guards around them, only when they're not defined should they be supplied by us, and that we should start using the regular names (stpcpy instead of xstpcpy). But that would be a more invasive change at this point. My question is if I could otherwise inform gcc that xstpcpy is exactly like stpcpy?

P.S.: a related question: is there a flag, such as -std=c99, which forces gcc/clang to emit a call to the standard function no matter what? I seem to remember such a thing, but can't find a reference to it now. If -std=c99 indeed disables builtin expansion, I would to know how to enable builtin expansion while keeping -std=c99.

EDIT: Since everything seems to be a bit blurry, I've been trying some things. First of all, the code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
    const char str[] = "we have stpcpy";

    printf("static\n");
    {
        char p1[256];
        char p2[256];

        char *end1 = stpcpy(p1, str);
        char *end2 = (stpcpy)(p2, str);

        printf("static using stpcpy?\np1 = %s (end = %p)\np2 = %s (end = %p)\n",
                p1, end1, p2, end2);
    }

    return 0;
}

The results (I'm on OSX, but godbolt indicates it's similar on linux):

Commandline: gcc-4.9 -O3 -save-temps -march=native stpcpy.c -o stpcpy gcc 4.9 seems to issue a call to stpcpy_chk in place of the stpcpy() line, and regular _stpcpy (the libc call) in place of the (stpcpy)() line. I would have expected gcc to lower this into a mempcpy as the stpcpy builtin code in the gcc codebase made initially made me believe.

Commandline: clang -O3 -save-temps -march=native stpcpy.c -o stpcpy (XCode clang 3.4) Clang has more or less the behaviour I was expecting from gcc. It completely optimizes out the calls to stpcpy. Creating asm like this:

leaq    -258(%rbp), %rdx
movabsq $34182044572742432, %rax ## imm = 0x79706370747320
movq    %rax, -265(%rbp)
movabsq $2334402142592329079, %rcx ## imm = 0x2065766168206577
movq    %rcx, -272(%rbp)

Instead of calling _stpcpy.

I wonder if I can make gcc-4.9 do what I want. Using godbolt with different versions I haven't been able to create similar code with gcc as what clang does.

Aktau
  • 1,847
  • 21
  • 30
  • @technosaurus: I want to know the opposite of `-fno-builtin`. – Aktau Sep 04 '14 at 15:05
  • @technosaurus`-fno-builtin` only works in GCC 4.9.x https://stackoverflow.com/questions/25272576/gcc-with-fno-builtin-does-not-seem-to-work – Z boson Sep 04 '14 at 15:44
  • `-fno-builtin` has been around for way longer than just since 4.9 ... -fbuiltin (if that even exists) is the default behavior unless `-fno-builtin` or `-ffreestanding` is specified. Using -std=*** to change any of those would make no sense, since that behavior is implementation defined and not a standard. As much as I hate to recommend it, autotools was designed for this situation. – technosaurus Sep 04 '14 at 17:28
  • @technosaurus, did you read the link? Did you try using `-fno-builtin` with e.g. GCC 4.8.1 and looking at the assembly? – Z boson Sep 04 '14 at 17:39
  • @Zboson gcc.godbolt.org uses `g++`, not `gcc` and any version that -fno-builtin fails is a bug in that version (also note that 4.8 was the first version to use C++, so new bugs were expected) autotools commonly adds workarounds for compiler bugs (which is why the configure script can end up being 100x the size of your actual code) ... if you want to ensure that the library function is used (force to not use builtins) you can just call it like `(function_name)(parameters,...)` instead of `function_name(parameters,...)` ... note the parenthesis around the function name. – technosaurus Sep 04 '14 at 18:13
  • @technosaurus, in the link I pointed you too you can see in the comments that I tired GCC as well and (function_name)(parameters,...) as one answer suggested. It does not work. – Z boson Sep 04 '14 at 18:35
  • Can you afford to use `static inline char *xstpcpy(…){…}`? Does that get you enough bang for the buck? The fact that the implementation has to scan the string twice is a pain. – Jonathan Leffler Dec 20 '14 at 07:58
  • @JonathanLeffler it might make a decent compiler optimize out some redundant strlen()'s, but it won't help with the first double-scan. So unfortunately that's only a tiny fraction of the solution. – Aktau Dec 20 '14 at 09:22

3 Answers3

3

I don't know how the builtin for stpcpy works but for memcpy it requires that the size be a compile time constant and less than or equal to 8192 bytes. If your code meets those two requirements (and you don't use -fno-builtin) GCC will use the builtin memcpy. I don't know of a way to force it to use builtin for larger sizes.

To disable builtins you can use -fno-builtin. However, -fno-builtin only seems to work for GCC 4.9.x.

Edit: To use the builtins with -std=c99 use __builtin_memcpy. I just tried this and looked at the assemlby. Using memcpy calls memcpy. Using __builtin_memcpy builds the memory copy in directly. However, if you put the size larger than 8192 it calls the memcpy function. It's the same as using -std=gnu99.

Z boson
  • 32,619
  • 11
  • 123
  • 226
  • The builtin for `stpcpy` degrades into the buitin for `mempcpy`, so it will be very similar. I'm not asking to force builtins always, I'm asking for our custom functions to be replaced by builtins when appropriate. (e.g.: `xstpcpy` should become the appropriate builtin `stpcpy` if `strlen(s) < 8192`). I've also read somewhere that builtins get disabled in certain standards-modes. _If_ that's the case, I would like to disable that feature, yet retain standards-mode. – Aktau Sep 04 '14 at 15:55
  • No, that would force the builtin to be used in all cases. I'd like it to only be used in cases where the compiler deduces it to be beneficial. Just like for the regular cases (`strlen`, `memcpy`, `stpcpy`, ...), but for "our" versions (`xstpcpy`, ...). – Aktau Sep 04 '14 at 16:10
  • @Aktau, no, `__builtin_memcpy` has the same restrictions as before. – Z boson Sep 04 '14 at 16:15
  • I have no problems with `-fno-builtin` on gcc 4.8.3 – keltar Sep 04 '14 at 16:30
  • @keltar, it does not work for me with GCC 4.8.1 and below. You can check it yourself at http://gcc.godbolt.org/. Make sure the size is a compile time constant and less than 8192. – Z boson Sep 04 '14 at 16:55
2

Builtin functions are not-user-defined by definition.

You could use e.g.

#ifdef HAVE_STPCPY
#define xstpcpy stpcpy
#else
char *xstpcpy(char *restrict dst, const char *restrict src);
#endif

in your header file, provided you have HAVE_* macros defined by something like configure. This will allow compiler to use builtins when it is reasonable.

As for -std=c99 - C99 have no stpcpy, it is glibc specific function. You probably tested it with implicit declarations on. gcc can't determine function is candidate for builtin replacement if its prototype is different. It is one of many problems implicit declarations introduce.

keltar
  • 17,711
  • 2
  • 37
  • 42
  • This was the first solution I was thinking of, and I think we might have to go for it, even if it's a tad bit ugly. I initially wanted to avoid having to define "HAVE_" in the configure step. By the way, we're actually compiling with `-std=gnu99`, but we need MSVC compat later too. But I guess the important thing is: does gcc handle `stpcpy` as a builtin even with `-std=c99` specified? – Aktau Sep 04 '14 at 16:17
  • @Aktua why don't you add `-S` when you compile and look at the assembly to answer your question? – Z boson Sep 04 '14 at 16:19
  • Quick test says "no, unless -D_GNU_SOURCE is also defined", which makes sense - you can't safely use glibc specific function without this macro up. – keltar Sep 04 '14 at 16:21
  • @keltar If you add parenthesis around the stpcpy, then the library function will always be called ... `#define xstpcpy (stpcpy)` see: http://stackoverflow.com/a/25272962/1162141 – technosaurus Sep 04 '14 at 18:18
  • @technosaurus that seems to be mostly a gcc thing though, clang doesn't respect this (see my edit). – Aktau Sep 04 '14 at 21:45
1

This can be done by defining a macro of the same name as your function and will work even when compiling with -fno-builtin or -ffreestanding if you want to avoid builtins on the rest of your functions.

For example:

#define strlen __builtin_strlen
//or
#define strlen(...) __builtin_strlen(__VA_ARGS__)

note: if you name it my_strlen(), you can add a weak alias to strlen to allow strlen to be overridden by an actual strlen() function if present

If the value can be computed at compile time, it will be reduced to a constant. If it cannot be reduced to a constant, then it will either:

  • replace the call with its own optimized version if it has one
  • fallback to your version

strlen does have a builtin replacement (the repne scazb variant), but I'm not sure if there is a way (aside from gutting it out of the compiler) to get the constant check pass without the code replacement

Edit: add macro to check for builtins

#ifdef __clang__
    #define HAS(...) __has_builtin(__VA_ARGS__)
#elif defined __GNUC__ //assume gcc ... (where the list came from)
    #define HAS(...) 1
#else
    #define HAS(...) 0
#endif
#if HAS(__builtin_stpcpy)
    #define stpcpy __builtin_stpcpy
#endif
technosaurus
  • 7,676
  • 1
  • 30
  • 52
  • "note: if you name it my_strlen(), you can add a weak alias to strlen to allow strlen to be overridden by an actual strlen() function if present". I've been thinking about this, but can't find if MSVC supports this. That would be a requirement, sadly. – Aktau Dec 20 '14 at 09:25
  • 1
    @Aktau It wasn't mentioned in the original question, but apparently it does have something similar: http://stackoverflow.com/a/11529277/1162141 – technosaurus Dec 21 '14 at 00:30