8

I would like to compare the GCC builtin function memcpy versus the one one from libc. However, all iterations of -fno-builtin or -fno-builtin-memcpy seem to be ignored.

//g++ -O3 foo.cpp -S or
//g++ -O3 -fno-builtin foo.cpp -S
#include <string.h>
int main() {
    volatile int n = 1000;
    //int n = 1000;
    float *x = new float[1000];
    float *y = new float[1000];
    memcpy(y,x,sizeof(float)*n);
    //__builtin_memcpy(y,x,sizeof(float)*n);    
}

What I have found is that if n in the source code above is not volatile then it inlines built-in code. However, when n is made volatile then it calls the function __memcpy_chk which is a version of memcpy with buffer overflow checking. If n is volatile and I instead call __builtin_memcpy then it calls memcpy.

So my conclusion so far is that the builtin code is only generated if n is known at compile time and that -fno-builtin is useless. I'm using GCC 4.8.2.

Is -fno-builtin obsolete? Is there a way to make GCC call memcpy from the C library even when n is known at compile time?

Z boson
  • 32,619
  • 11
  • 123
  • 226
  • I just noticed that in this question and your comments, you have misspelled `-fno-builtin` several different ways. Check to make sure that's not throwing off your results. The correct spelling is "b u i l t i n". – zwol Aug 13 '14 at 18:52
  • @Zack, thanks for finding the misspellings. I hope I fixed them all. I have indeed been a bit too sloppy (I also used `new` when I should have used malloc in my test example - in my own code I'm using `_mm_malloc` anyway). But GCC complains if I use any of the mispellings. But if you want to check for yourself drop the code above (remove the volatile) into http://gcc.godbolt.org/ and change to GCC 4.8 or 4.9, add `-fno-builtin` and look at the assembly code. – Z boson Aug 13 '14 at 19:36
  • @Zack, make sure you add `-O3` as well. – Z boson Aug 13 '14 at 19:43

3 Answers3

4

-fno-builtin and -fno-builtin-memcpy both have the effect you expected with gcc 4.9.1. This is probably just a bug in gcc 4.8.2; this particular combination of options is not widely used. -ffreestanding is a related switch that may have the effect you want with 4.8.2.

Note that the compiler is within its rights to optimize your program down to

int main() { return 0; }

when invoked without -fno-builtin(-memcpy) or -ffreestanding, even when n is volatile, as it can (in principle) prove that the program as a whole either has no observable side effects, or its behavior is undefined. (When n is not volatile, there cannot be UB; the UB happens if n is outside the range [0, 1000] when read, and volatile tells the compiler it can't assume n has the value written to it by the program.)

zwol
  • 135,547
  • 38
  • 252
  • 361
  • I tried `-ffreestanding`. It made no difference. I'll have to install gcc 4.9.1 to verify your answer. – Z boson Aug 12 '14 at 20:07
  • I looked at the assembly output at `http://gcc.godbolt.org/` and I can confirm that `-fno-builtin` works as expected in GCC 4.9 and not in GCC 4.8.1. – Z boson Aug 12 '14 at 20:23
  • I checked GCC 4.4 to 4.9 and 4.9 is the only one that gets it correct. – Z boson Aug 12 '14 at 20:26
  • ICC does it correctly as well. When I pass `-fno-builtin` is uses `memcpy` and without it it uses `_intel_fast_memcpy`. – Z boson Aug 12 '14 at 20:28
  • Just to be clear, the compiler did not optimize the code code away. That's clear from looking at the assembler code. – Z boson Aug 13 '14 at 07:24
  • @Zboson Yes. I was just saying that it *could have*. Moreover, Clang *does*. I filed a bug on GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62112 – zwol Aug 13 '14 at 13:33
  • BTW, I still don't know how to remove the builtin code in GCC 4.8. Is my only option to upgrade to GCC 4.9? – Z boson Aug 13 '14 at 19:42
  • @Zboson It does look like it. I doubt the GCC folks will consider this an important enough bug to fix in the 4.8.x series, although you can certainly try reporting it (http://gcc.gnu.org/bugzilla/). – zwol Aug 13 '14 at 19:54
  • If you want to know why I care about this it's from Agner Fog's [Optimizing C++ manual](http://www.agner.org/optimize/optimizing_cpp.pdf). See section 2.6 "Comparison of function libraries". He claims that memcpy from GCC builtin and Glibc underperform (and that builtin is the worst). Using his asmlib I can see it performs much better than memcpy from glibc. I have not compared to the builtin function yet. He recommends using `-fno-builtin`. – Z boson Aug 13 '14 at 20:33
  • @Zboson To the extent that that document is correct, report *that* as a bug in GCC and/or glibc and it *will* be taken seriously. However, substantial work went into string function optimization in GCC 4.9 and glibc 2.19; I would not be surprised if that document were now inaccurate. – zwol Aug 13 '14 at 22:07
  • I compared the builtin memcpy from GCC 4.8 and 4.9 (which I installed today) and they seem efficiency now. However, glibc 2.19 is still quite slow compared to the asmlib. I think the asmlib is using non-temporal stores (`_mm_stream_ps`) for large sizes. But the builtin code must be a compile time constant and less than or equal to 8192. So I have away around the `-fno-builtin` bug for GCC 4.8. – Z boson Aug 14 '14 at 19:52
  • Sorry to belabor this point, but I just downloaded glibc 2.19, compiled it and installed it and now it's as fast as the Asmlib as you said. What I compared to before was Eglibc 2.19 which is the default on Ubuntu/Debian (until the next version). It makes a huge difference. – Z boson Aug 23 '14 at 23:05
1

Note: because you're compiling C++ code, I'm not 100% sure if this applies.

The C standard requires all library functions (unless explicitly indicated otherwise) have an address and can be the operand of the & address operator. This is because it allows some/most functions to be implemented as a functional macro, but should still behave like an actual variable/function in certain cases. To avoid the macro version of it, you just need something between the memcpy token and the ( token (as @Zach pointed out, whitespace is insufficient):

(memcpy)(y, x, ...)

This forces the use of the actual function, which should avoid any sort of builtin macro definition.


It's also possible (read: likely) that the -O3 optimization scans for certain function calls (such as memcpy) and replaces them with builtin calls, regardless of -fno-builtin.

Drew McGowen
  • 11,471
  • 1
  • 31
  • 57
  • A macro version of `memcpy` will still be expanded with whitespace between `memcpy` and `(`. Putting parentheses around the token `memcpy`, however, does behave as you describe. – zwol Aug 12 '14 at 19:55
  • Also, I can say authoritatively that `-fno-builtin` is *supposed* to do what the OP expects, at all optimization levels. https://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/C-Dialect-Options.html#index-fno_002dbuiltin-107 – zwol Aug 12 '14 at 19:57
  • I tried both your suggestions with GCC and g++ (this code compiles just fine in C so I'm not sure the C++ tag is better) and they made no difference. It's as if `-O3` trumps everything. – Z boson Aug 12 '14 at 20:06
  • @Zboson `new` is C++, not C; this program *should* give you a compile-time error if put in a file with a `.c` extension and compiled with the `gcc` command instead of `g++`. – zwol Aug 12 '14 at 20:08
  • @Zack, yes, you're correct. I should have used malloc to avoid confusion. – Z boson Aug 12 '14 at 20:10
  • @Zack only if it's an object macro, not a function macro. The C standard explicitly states that a functional macro invocation requires *no* space between the macro name and the opening parenthesis – Drew McGowen Aug 12 '14 at 20:28
  • 1
    @DrewMcGowen You are thinking of functional macro *definition*, not use. [N1570](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) (closest approximation to C2011 available without paying for it): 6.10.3p3 plus the grammar rules in 6.10.3p{9,10} together require that functional macros are *defined* with no space between the macro-name and the open parenthesis. (1/2) – zwol Aug 12 '14 at 20:39
  • 1
    @DrewMcGowen However, the prose in 6.10.3p10 (in particular, the sentence beginning "Each subsequent instance of the function-like macro name followed by a `(` *as the next preprocessing token*" (emphasis mine: whitespace is not a preprocessing token)) clearly (well, if you're accustomed to standardese) indicates that functional macros are *expanded* even if there is any amount of whitespace of any kind between the macro name and the `(`. (2/2) – zwol Aug 12 '14 at 20:41
  • Confession time: I wrote roughly 40% of GCC's current preprocessor. That was more than ten years ago, but I guess I still pretty much have section 6.10 memorized. – zwol Aug 12 '14 at 20:43
  • @Zack impressive! I've been working on my own C compiler - so now I guess I know who to bother when I'm stuck ;) – Drew McGowen Aug 12 '14 at 20:45
  • I changed from new to malloc and changed the file name to .c and tried your suggestions. They did not make a difference. Strangely without `-fno-builtin` GCC produces simple assembly without memcpy and with it it produced a result similar to g++ without `-fno-builtin`. So the switch does have some effect but it still produced built-in code which does not call memcpy. – Z boson Aug 13 '14 at 19:33
1

Most likely part of your problem is with glibc, not gcc. You didn't specify, but you are probably using Ubuntu, which defines -D_FORTIFY_SOURCE=2 by default. This prompts the glibc headers to provide an inline definition of memcpy that forwards to __memcpy_chk.

Marc Glisse
  • 7,550
  • 2
  • 30
  • 53