12

gcc optimizes code when I pass it the -O2 flag, but I'm wondering how well it can actually do that if I compile all source files to object files and then link them afterwards.

Here's an example:

// in a.h
int foo(int n);

// in foo.cpp
int foo(int n) {
  return n;
}

// in main.cpp
#include "a.h"
int main(void) {
  return foo(5);
}

// code used to compile it all
gcc -c -O2 foo.cpp -o foo.o
gcc -c -O2 main.cpp -o main.o
gcc -O2 foo.o main.o -o executable

Normally, gcc should inline foo because it's a small function and -O2 enables -finline-small-functions, right? But here, gcc only sees the code of foo and main independently before it creates the object files, so there won't be any optimizations like that, right? So, does compiling like this really make code slower?

However, I could also compile it like this:

gcc -O2 foo.cpp main.cpp -o executable

Would that be faster? If not, would it be faster this way?

// in foo.cpp
int foo(int n) {
  return n;
}

// in main.cpp
#include "foo.cpp"
int main(void) {
  return foo(5);
}

Edit: I looked at objdump, and its disassembled code showed that only the #include "foo.cpp" thing worked.

thejh
  • 44,854
  • 16
  • 96
  • 107
  • Put small functions in the .h file, annotated as `inline`. (Though to be perfectly general you also need to define a "hashome" attribute, using a syntax that I forget.) – Hot Licks Apr 21 '12 at 14:25
  • @HotLicks: You mean, put the functions including their bodies, not just the header, into the .h file? – thejh Apr 21 '12 at 14:27
  • Yes, and that's what the "inline" annotation on the method is intended to indicate. – Hot Licks Apr 22 '12 at 00:21

3 Answers3

10

It seems that you have rediscovered on your own the issue about the separate compilation model that C and C++ use. While it certainly eases memory requirements (which was important at the time of its creation), it does so by exposing only minimal information to the compiler, meaning that some optimizations (like this one) cannot be performed.

Newer languages, with their module systems can expose as much information as necessary, and we can hope to rip those benefits if modules get into the next version of C++...

In the mean time, the simplest thing to go for is called Link-Time Optimization. The idea is that you will perform as much optimization as possible on each TU (Translation Unit) to obtain an object file, but you will also enrich the traditional object file (which contain assembly) with IR (Intermediate Representation, used by compilers to optimize) for part of or all functions.

When the linker will be invoked to merge those object files together, instead of just merging the files together, it will merge the IR representations, rexeecute a number of optimization passes (constant propagation, inlining, ...) and then create assembly on its own. It means that instead of being just a linker, it is in fact a backend optimizer.

Of course, like all optimization passes this has a cost, so makes for longer compilation. Also, it means that both the compiler and the linker should be passed a special option to trigger this behavior, in the case of gcc, it would be -lto or -O4.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
8

You may be looking for Link-Time Optimization (LTO), aka Whole Program Optimization.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • Might be worth a try, but I don't really like having to tell people to use a non-standard gcc for getting fast machine code... – thejh Apr 21 '12 at 15:09
  • 2
    @thejh: Read carefully, the branch has been merged into the trunk and is now part of the gcc everyone knows and uses. – Matthieu M. Apr 21 '12 at 15:29
1

Since you're using GCC, you can use the C99 inline function specifier mechanism. This is from ISO/IEC 9899:1999.

§ 6.7.4 Function specifiers

Syntax

¶1 function-specifier:

      inline

Constraints

¶2 Function specifiers shall be used only in the declaration of an identifier for a function.

¶3 An inline definition of a function with external linkage shall not contain a definition of a modifiable object with static storage duration, and shall not contain a reference to an identifier with internal linkage.

¶4 In a hosted environment, the inline function specifier shall not appear in a declaration of main.

Semantics

¶5 A function declared with an inline function specifier is an inline function. The function specifier may appear more than once; the behavior is the same as if it appeared only once. Making a function an inline function suggests that calls to the function be as fast as possible.118) The extent to which such suggestions are effective is implementation-defined.119)

¶6 Any function with internal linkage can be an inline function. For a function with external linkage, the following restrictions apply: If a function is declared with an inline function specifier, then it shall also be defined in the same translation unit. If all of the file scope declarations for a function in a translation unit include the inline function specifier without extern, then the definition in that translation unit is an inline definition. An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.120)

¶7 EXAMPLE The declaration of an inline function with external linkage can result in either an external definition, or a definition available for use only within the translation unit. A file scope declaration with extern creates an external definition. The following example shows an entire translation unit.

inline double fahr(double t)
{
    return (9.0 * t) / 5.0 + 32.0;
}
inline double cels(double t)
{
    return (5.0 * (t - 32.0)) / 9.0;
}
extern double fahr(double); // creates an external definition
double convert(int is_fahr, double temp)
{
    /* A translator may perform inline substitutions */
    return is_fahr ? cels(temp) : fahr(temp);
}

¶8 Note that the definition of fahr is an external definition because fahr is also declared with extern, but the definition of cels is an inline definition. Because cels has external linkage and is referenced, an external definition has to appear in another translation unit (see 6.9); the inline definition and the external definition are distinct and either may be used for the call.

118) By using, for example, an alternative to the usual function call mechanism, such as "inline substitution". Inline substitution is not textual substitution, nor does it create a new function. Therefore, for example, the expansion of a macro used within the body of the function uses the definition it had at the point the function body appears, and not where the function is called; and identifiers refer to the declarations in scope where the body occurs. Likewise, the function has a single address, regardless of the number of inline definitions that occur in addition to the external definition.

119) For example, an implementation might never perform inline substitution, or might only perform inline substitutions to calls in the scope of an inline declaration.

120) Since an inline definition is distinct from the corresponding external definition and from any other corresponding inline definitions in other translation units, all corresponding objects with static storage duration are also distinct in each of the definitions.


Note that GCC also had inline functions in C before they were standardized. Read the GCC manual for details if you need that notation.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278