When it comes to procedural programming, functional decomposition is ideal for maintaining complicated code. However, functions are expensive- adding to the call stack, passing parameters, storing return addresses. all of this takes extra time! When speed is crucial, how can I get the best of both worlds? I want a highly decomposed program without any necessary overhead introduced by function calls. I'm familiar with the keyword: "inline" but that seems to be only be a suggestion to the compiler, and if used incorrectly by the programmer it will yield an even slower program. I'm using g++, so will the -03 flag optimize away my functions that call functions that call functions.. I just wanted to know, if my concerns are valid and if there are any methods to combat this issue.
-
possible duplicate of [What can I use to profile C++ code in Linux?](http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux) – Alex Reynolds Mar 04 '12 at 04:00
-
2Usually, don't worry about it. A good compiler will likely inline for you whenever it can, even if you don't use the `inline` keyword. – Fred Larson Mar 04 '12 at 04:01
-
6Your concerns are **not** valid. There are people doing stuff where it may matter, but (1) somehow I don't believe you're in that situation and (2) those people don't get very far by guessing and hearsay. – Mar 04 '12 at 04:03
-
no ideal about C++ (i'm on JS), but you will also have to weigh in concerns about maintainable, reusable and individually testable code. – Joseph Mar 04 '12 at 04:05
-
@delnan: If you post that as an answer, you'll get a +1 from me. :-) – ruakh Mar 04 '12 at 04:05
-
I agree with @delnan as well. – TotoroTotoro Mar 04 '12 at 04:12
-
Why do you think your program is slow? – 01100110 Mar 04 '12 at 04:34
-
Well that's can yield a interesting discussion, but generally i agree with @delnan too -- if performance becomes an issue, just profile the heck out of your program and see where you are spending time. Usually it is NOT for this kind of things. – WDRust Mar 04 '12 at 06:19
-
@delnan why is it so hard to believe? One can easily find themselves in this situation doing any kind of image processing. Just one function call per pixel and your run time will be dominated by overhead. – gordy Mar 04 '12 at 06:58
-
11st rule of optimizing: Your code is always fast enough, until it's not. You will know when that happens. – Xeo Mar 04 '12 at 07:00
-
@gordy Yeah, but how many people do image processing? Not a whole lot as far as I can tell, and many of them should know better than ask such questions. Also, calling a trivial function on billions of pixels is kind of an edge case, in many otherwise very performance-sensitive areas, the functions are less trivial and thus take enough time to shadow the function call overhead or at least ruin the benefit of inlining by filling the cache. And all that assuming the compiler *didn't* inline. – Mar 04 '12 at 07:03
3 Answers
First, as always when dealing with performance issues, you should try and measure what are your bottlenecks with a profiler. The first thing coming out is usually not function calls and by a large margin. If you did this, then please read on.
Then, you can anticipate a bit what functions you want inlined by using the inline
keyword. The compiler is usually smart enough to know what to inline and what not to inline (it can inline functions you forgot and may not inline some you mentionned if he thinks it won't help).
If (really) you still want to improve performance on function calls and want to force inlining, some compilers allow you to do so (see this question). Please consider that massive inlining may actually decrease performance: your code will use a lot of memory and you may get more cache misses on the code than before (which is not good).
If it's a specific piece of code you're worried about you can measure the time yourself. Just run it in a loop a large number of times and get the system time before and after. Use the difference to find the average time of each call.
As always the numbers you get are subjective, since they will vary depending on your system and compiler. You can compare the times you get from different methods to see which is generally faster, such as replacing the function with a macro. My guess is however you won't notice much difference, or at the very least it will be inconsequential.
If you don't know where the slowdown is follow J.N's advice and use a code profiler and optimise where it's needed. As a rule of thumb always pass large objects to functions by reference or const reference to avoid copy times.

- 418
- 3
- 10
I highly doubt speed is that curcial, but my suggestion would be to use preprocessor macros.
For example
#define max(a,b) ( a > b ? a : b )
This would seem obvious to me, but I don't consider myself an expect in C++, so I may have misunderstood the question.

- 7,779
- 5
- 58
- 84
-
4(1) Macros are not idiomatic in C++, a template is safer in every respect, just as efficent, and will respect scoping. (2) Just inlining lots of stuff, as explained in other answers (here and elsewhere), isn't terribly likely to improve performance. – Mar 04 '12 at 06:22