0

-- snipped from chat.so --

I am stuck with gcc 4.6.2 on a certain project and after profiling with intel VTune i noticed that very insignificant functions were not being inlined (or at least showed up under hotspots, which I assumed meant a failed inline)

an example function is a reinterpret cast, 2 numeric additions, and a ternary statement

i BELIEVE these are being inlined in Windows, but due to the profiling, think they are not being inlined in linux under gcc 4.6.2

I am attempting to get an ICC build working in linux (works in windows), but that'll take a little time

until then, does anyone know if GCC 4.6.2 is that different from VS2010 in terms of relatively simple compiler optimizations? I've turned on -O3 in GCC

what led me to this is that this is a rewrite of a significant section of code, and on Windows, the performance is approximately equal or a little slower, while on Linux it is at least 2x as slow.

The most informative answer would help me understand the steps required to verify inlining across platforms and how best to approach this situation as I understand these things are extremely situation-specific.

EDIT: Also, assuming that business-specific reasons force me to stick with GCC 4.6.2, what can I do about this without rewriting the code to make it less maintainable?

Thanks!

im so confused
  • 2,091
  • 1
  • 16
  • 25
  • If the function(s) is in one translation unit and the call is in another, the compiler can't do the inlining, it's up to the linker to try and perform such inlining. You might want to check flags for the linker (the `ld` program). – Some programmer dude Apr 21 '14 at 14:31
  • @JoachimPileborg that makes complete sense - never thought of that (i am self-taught so I miss these things). the functions are all indeed meant to be in a separate library, but at least one, specifically, is defined in a header file. This would then not apply, correct? – im so confused Apr 21 '14 at 14:34
  • 2
    Check `info gcc Invoking 'Optimize Options'` for options to tune the inlining behavior. `-finline-limit=N` and `--param large-function-growth=Ǹ` comes to mind. – Torkel Bjørnson-Langen Apr 21 '14 at 14:34
  • @TorkelBjørnson-Langen thanks, those seem like handy parameters, and I'll study and note them for later, but I would assume that the default should be able to handle 10 lines (at most with expansion or something), correct? – im so confused Apr 21 '14 at 14:39
  • 1
    If you have a function defined in a header file (marked as `static` or `inline`) then the compiler may indeed inline it. The keyword here though is *"may"*, it's up to the compiler to decide, even if the function is declared as `inline`. And different compilers will use different heuristics to decide if inlining is worth it or not. You might want to play a little with other optimization options, or maybe go down to "only" `-O2` and add some individual optimization options from `-O3` if you want them. – Some programmer dude Apr 21 '14 at 14:39
  • @JoachimPileborg thanks, yeah that's what I've gleaned so far from reading online. I guess that's the only way. Is there a way to verify the inlining behavior without running my stress test for time comparisons? If i use -Winline I'll have to manually declare every function as static or inline, correct? I suppose the best option i have is to get the ICC build working and to prove the deficiencies in GCC. it's a big company not usually in the software business, so i'm facing a lot of flak trying to fix their old code/compiler problems. – im so confused Apr 21 '14 at 14:49
  • 1
    **Consider upgrading your GCC** ([4.9.0](http://gcc.gnu.org/gcc-4.9/) will be released in a few days, or at least use [4.8.2](http://gcc.gnu.org/gcc-4.8/)...) and **enable *link time optimization* by compiling *and linking* with `-flto`** (in addition of optimization flags like `-O2`). [4.6.2](http://gcc.gnu.org/gcc-4.6) is quite old and obsolete. – Basile Starynkevitch Apr 21 '14 at 14:55
  • @BasileStarynkevitch thanks, like I said, it's not so simple as to just upgrade (in the end we'll be upgrading to ICC over GCC also), but I had another question about your comment - you mentioned -O2, and I've seen that more frequently than -O3 Is that a better option to be providing? Why *not* -O3 always if one's program is standards compliant? (this one's not but that's a separate issue...) – im so confused Apr 21 '14 at 15:01
  • You can use `-O3` instead of `-O2`. However, often (but not always) the performance gain is small, but the compilation overcost is significant (even more with `-flto`). BTW, you could compile GCC-4.9 from its source code on your system. – Basile Starynkevitch Apr 21 '14 at 15:06
  • You could give [`__attribute__((always_inline))`](http://gcc.gnu.org/onlinedocs/gcc/Inline.html) a shot. In my opinion, it is *a horrible workaround.* The true solution is to use link time optimization as Basile writes; if that is not possible, you can still mess with the inlining thresholds as [Torkel](http://stackoverflow.com/questions/23199385/gcc-4-6-2-inlining-behavior#comment35485172_23199385) suggests. – Ali Apr 21 '14 at 17:24
  • @Ali thanks for your link, but reading it, "GCC does not inline any functions **when not optimizing** unless you specify the ‘always_inline’ attribute for the function" seems to suggest that this will have no effect as I am already compiling with -O3. Is this an incorrect interpretation? – im so confused Apr 21 '14 at 17:34
  • @Ali further, Basile's -flto suggestion does not apply for this specific case as a function in question is defined in a header file (static inline). Correct? and Torkel's suggestion is something I'll try, but I suspect is inconsequential because the functions are one-liners, though could be maybe 10 lines of verbose, 1-asm instruction lines when expanded, I suppose. – im so confused Apr 21 '14 at 17:37
  • 1
    @imsoconfused *"this will have no effect as I am already compiling with -O3. Is this an incorrect interpretation?"* Don't know, sorry. I cannot test it either as I don't have gcc 4.6. I would give it a shot; it doesn't seem to be too complicated to check the effect in the assembly code. *"Basile's -flto suggestion does not apply [...]. Correct?"* Many things happen at link time optimization, impossible to tell. Things that are normally not inlined, are often inlined with link time optimization; there is no other way than trying. – Ali Apr 21 '14 at 20:49
  • @imsoconfused Another thing: You could try the [profile guided optimization](http://stackoverflow.com/q/4365980/341970). It would most likely inline those functions and also improve the performance at other parts of the code. – Ali Apr 21 '14 at 20:53

1 Answers1

0

First the super-obvious for completeness: Are you absolutely sure that all the files doing the probably non-inlined calls were compiled with -O3?

The gcc and VS compiler and tool chains are sufficiently different that it wouldn't surprise me at all if their optimizers behaved rather differently.

Next let me observe that the ternary operator can be very deceiving. Ternary operators are almost certainly going to create a branch and potentially constructor calls, conversions, etc. Don't assume that just because it's a terse operator in C++ the compiler will be able generate a tiny amount of code for it. This could potentially inhibit the compiler from optmizing it. In fact, you could try reworking the ternary code into a normal if statement and see if that helps your performance at all.

Then once you've moved on to further diagnostics, an easy thing to try is to use strings <binary> | grep function and see if the function name shows up in the binary at all. If it doesn't then it's definitely being inlined (although even if it shows up it could be strictly debug information and not actual code). There are other tools such as nm, readelf, elfdump, and dump that can introspect binaries for symbols as well. You would need to see which tools are available on your platform and then try to use them to find the function(s) in question.

Another idea is to load the compiled binary into gdb, and ask it to disassemble the code at the file and line at the point where the function call is made. Then you can read the disassembly code to see what the compiler did. Most of the code should actually be fairly obvious. You will likely see something like a call instruction if an actual function call was made.

Mark B
  • 95,107
  • 10
  • 109
  • 188