37

Let's say that I have a function that gets called in multiple parts of a program. Let's also say that I have a particular call to that function that is in an extremely performance-sensitive section of code (e.g., a loop that iterates tens of millions of times and where each microsecond counts). Is there a way that I can force the complier (gcc in my case) to inline that single, particular function call, without inlining the others?

EDIT: Let me make this completely clear: this question is NOT about forcing gcc (or any other compiler) to inline all calls to a function; rather, it it about requesting that the compiler inline a particular call to a function.

fouric
  • 1,598
  • 1
  • 19
  • 37
  • 4
    Do it manually? (by simply inserting the code there.) – librin.so.1 Jan 28 '13 at 21:31
  • Why not use the inline keyword http://gcc.gnu.org/onlinedocs/gcc/Inline.html – Ifthikhan Jan 28 '13 at 21:33
  • 1
    For clarity, you want a specific call to `foo()` to be inline, but other calls to `foo()` are done normally? – Dan F Jan 28 '13 at 21:35
  • 8
    For those voting to close: This is not a duplicate of the question proposed as a duplicate. That one asks how to inline a function (in all calls to the function). This one asks how to inline one specific call to the function. – Eric Postpischil Jan 28 '13 at 21:50
  • Before creating yourself headaches, you should have a look into what gcc produces. Usually he is quite good in noticing places where it should inline a function. Look into the assembler that gcc produces (with `-S`). If it doesn't, many times it is the programmers fault because your interface isn't clean enough. – Jens Gustedt Jan 28 '13 at 22:02
  • 1
    @JensGustedt: Observing that GCC inlines a call in the current compilation does not guarantee that future versions of GCC will inline the call or that GCC will inline the call if compilation switches are changed. – Eric Postpischil Jan 28 '13 at 22:20
  • @EricPostpischil, so what? If today on a recent version of gcc with flags `-O3 -march=native` it makes sense to inline the call, then gcc will do it. If in 10 years from now, it still will make sense, gcc still will do it. Compilers are getting smarter, they accumulate all the knowledge on compiler optimization. People usually don't. – Jens Gustedt Jan 29 '13 at 00:08
  • 1
    @JensGustedt: No, GCC will not inline the call “if it makes sense”. GCC neither has general reasoning capabilities nor does it have all the information needed to make an optimal decision. GCC has heuristics, which may be good, but they are not as good as well-informed, experienced programmers with additional knowledge about the target platform, how the application will be used, how many times and in what circumstances the application will use a particular function, et cetera. Furthermore, all of this is beyond the premise of the question, which clearly asks how to inline a specific call. – Eric Postpischil Jan 29 '13 at 00:55
  • 1
    There's a similar question here: http://stackoverflow.com/questions/7108797/can-i-selectively-force-inline-a-function which talks about stronger hints for the microsoft compiler. – andygavin Sep 17 '15 at 10:39

8 Answers8

17

In C (as opposed to C++) there's no standard way to suggest that a function should be inlined. It's only vender-specific extensions.

However you specify it, as far as I know the compiler will always try to inline every instance, so use that function only once:

original:

   int MyFunc()  { /* do stuff */  }

change to:

   inline int MyFunc_inlined()  { /* do stuff */  }

   int MyFunc()  { return MyFunc_inlined(); }

Now, in theplaces where you want it inlined, use MyFunc_inlined()

Note: "inline" keyword in the above is just a placeholder for whatever syntax gcc uses to force an inlining. If H2CO3's deleted answer is to be trusted, that would be:

static inline __attribute__((always_inline)) int MyFunc_inlined()  { /* do stuff */  }
James Curran
  • 101,701
  • 37
  • 181
  • 258
  • 6
    There is. The `inline` keyword is standard C99. It does give a hint to the compiler about inlining, it just doesn't **force** inlining. –  Jan 28 '13 at 21:34
  • `inline` is a standard suggestion. Its *effectiveness* is implementation-defined. – Carl Norum Jan 28 '13 at 21:34
  • 1
    @CarlNorum, it isn't even a suggestion. It makes inlining possible, since it allows you to place a function definition in a header file without producing "multiple symbol" errors. – Jens Gustedt Jan 28 '13 at 21:59
  • The standard says "Making a function an inline function suggests that calls to the function be as fast as possible." That even contains the *word* "suggest". – Carl Norum Jan 28 '13 at 22:01
  • 1
    "So use that function only once"? Doesn't that defeat the main point of using at function at all, e.g., code reuse? – fouric Jan 28 '13 at 22:05
  • 5
    @InkBlend: The point is to have two versions of the function, one that has the always_inline attribute and is used only where you want to compel inlining, and another that does not have the always_inline attribute and is used where it is not important to inline. To avoid duplicating code, the non-always_inline version is created by simply calling the always_inline version. So the code is reused. – Eric Postpischil Jan 28 '13 at 22:21
  • Another interesting attribute for the `always_inline`-function might be `flatten`, which causes all calls in the body to be fully-inlined as possible. – Deduplicator Apr 16 '19 at 11:43
  • If `inline int func()` fails to compile, try adding `static` before `inline`. – umnikos Dec 31 '20 at 16:48
16

It is possible to enable inlining per translation unit (but not per call). Though this is not an answer for the question and is an ugly trick, it conforms to C standard and may be interesting as related stuff.

The trick is to use extern definition where you do not want to inline, and extern inline where you need inlining.

Example:

$ cat func.h 
int func();

$ cat func.c 
int func() { return 10; }

$ cat func_inline.h 
extern inline int func() { return 5; }

$ cat main.c       
#include <stdio.h>

#ifdef USE_INLINE
# include "func_inline.h"
#else
# include "func.h"
#endif

int main() { printf("%d\n", func()); return 0; }

$ gcc main.c func.c && ./a.out
10                                                // non-inlined version

$ gcc main.c func.c -DUSE_INLINE && ./a.out
10                                                // non-inlined version

$ gcc main.c func.c -DUSE_INLINE -O2 && ./a.out
5                                                 // inlined!

You can also use non-standard attribute (e.g. __attribute__(always_inline)) in GCC) for extern inline definition, instead of relying on -O2.

BTW, the trick is used in glibc.

Community
  • 1
  • 1
gavv
  • 4,649
  • 1
  • 23
  • 40
6

the traditional way to force inline a function in C was to not use a function at all, but to use a function like macro. This method will always inline the function, but there are some problems with function like macros. For example:

#define ADD(x, y) ((x) + (y))
printf("%d\n", ADD(2, 2));

There is also the inline keyword, which was added to C in the C99 standard. Notably, Microsoft's Visual C compiler doesn't support C99, and thus you can't use inline with that (miserable) compiler. Inline only hints to the compiler that you want the function inlined - it does not guarantee it.

GCC has an extension which requires the compiler to inline the function.

inline __attribute__((always_inline)) int add(int x, int y) {
    return x + y;
}

To make this cleaner, you may want want to use a macro:

#define ALWAYS_INLINE inline __attribute__((always_inline))
ALWAYS_INLINE int add(int x, int y) {
    return x + y;
}

I don't know of a direct way of having a function that can be force inlined on certain calls. But you can combine the techniques like this:

#define ALWAYS_INLINE inline __attribute__((always_inline))
#define ADD(x, y) ((x) + (y))
ALWAYS_INLINE int always_inline_add(int x, int y) {
    return ADD(x, y);
}

int normal_add(int x, int y) {
    return ADD(x, y);
}

Or, you could just have this:

#define ADD(x, y) ((x) + (y))
int add(int x, int y) {
    return ADD(x, y);
}

int main() {
    printf("%d\n", ADD(2,2));    // always inline
    printf("%d\n", add(2,2));    // normal function call
    return 0;
}

Also, note that forcing the inline of a function might not make your code faster. Inline functions cause larger code to be generated, which might cause more cache misses to occur. I hope that helps.

abhoriel
  • 103
  • 1
  • 5
  • Like the other answers, this does not answer my question. It is useful to know how to force a function to be inline, but what I want to know is if I can force a _particular call_ to that function to be inline. – fouric Jan 28 '13 at 23:17
  • InkBlend, I didn't understand your question fully to begin with, I have edited my answer to answer your question better. – abhoriel Jan 28 '13 at 23:20
4

The answer is it depends on your function, what you request and the nature of your function. Your best bet is to:

  • tell the compiler you want it inlined
  • make the function static (be careful with extern as it's semantics change a little in gcc in some modes)
  • set the compiler options to inform the optimizer you want inlining, and set inline limits appropriately
  • turn on any couldn't inline warnings on the compiler
  • verify the output (you could check the assembler generated) that the function is in-lined.

Compiler hints

The answers here cover just one side of inlining, the language hints to the compiler. When the standard says:

Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined

This can be the case for other stronger hints such as:

  • GNU's __attribute__((always_inline)): Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function even if no optimization level was specified.
  • Microsoft's __forceinline: The __forceinline keyword overrides the cost/benefit analysis and relies on the judgment of the programmer instead. Exercise caution when using __forceinline. Indiscriminate use of __forceinline can result in larger code with only marginal performance gains or, in some cases, even performance losses (due to increased paging of a larger executable, for example).

Even both of these would rely on the inlining being possible, and crucially on compiler flags. To work with inlined functions you also need to understand the optimisation settings of your compiler.

It may be worth saying inlining can also be used to provide replacements for existing functions just for the compilation unit you are in. This can be used when an approximate answers are good enough for your algorithm, or a result can be achieved in a faster way with local data-structures.

An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.

Some functions cannot be inlined

For example, for the GNU compiler functions that cannot be inlined are:

Note that certain usages in a function definition can make it unsuitable for inline substitution. Among these usages are: variadic functions, use of alloca, use of variable-length data types (see Variable Length), use of computed goto (see Labels as Values), use of nonlocal goto, and nested functions (see Nested Functions). Using -Winline warns when a function marked inline could not be substituted, and gives the reason for the failure.

So even always_inline may not do what you expect.

Compiler Options

Using C99's inline hints will rely on you instructing the compiler the inline behavour you are looking for.

GCC for instance has:

-fno-inline, -finline-small-functions, -findirect-inlining, -finline-functions, -finline-functions-called-once, -fearly-inlining, -finline-limit=n

Microsoft compiler also has options that dictate the effectiveness of inline. Some compilers will also allow optimization to take into account running profile.

I do think it's worth seeing inlining in the broader context of program optimization.

Preventing Inlining

You mention that you don't want certain functions inlined. This might be done by setting something like __attribute__((always_inline)) without turning on the optimizer. However you would probably would want the optimizer. One option here would be to hint you don't want it: __attribute__ ((noinline)). But why would this be the case?

Other forms of optimization

You may also consider how you might restructure your loop and avoiding branches. Branch prediction can have a dramatic effect. For an interesting discussion on this see: Why is it faster to process a sorted array than an unsorted array?

Then you also might smaller inner loops to be unrolled and to look at invariants.

Community
  • 1
  • 1
andygavin
  • 2,784
  • 22
  • 32
3

There's a kernel source that uses #defines in a very interesting way to define several different named functions with the same body. This solves the problem of having two different functions to maintain. (I forgot which one it was...). My idea is based on this same principle.

The way to use the defines is that you'll define the inline function on the compilation unit you need it. To demonstrate the method I'll use a simple function:

int add(int a, int b);

It works like this: you make a function generator #define in a header file and declare the function prototype of the normal version of the function (the one not inlined).

Then you declare two separate function generators, one for the normal function and one for the inline function. The inline function you declare as static __inline__. When you need to call the inline function in one of your files, you use the generator define to get the source for it. In all other files you need to use the normal function, you just include the header with the prototype.

The code was tested on:

Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz
Kernel Version: 3.16.0-49-generic
GCC 4.8.4

Code is worth more than a thousand words, so:

File Hierarchy

+
| Makefile
| add.h
| add.c
| loop.c
| loop2.c
| loop3.c
| loops.h
| main.c

add.h

#define GENERATE_ADD(type, prefix)  \
    type int prefix##add(int a, int b) { return a + b; }

#define DEFINE_ADD()            GENERATE_ADD(,)
#define DEFINE_INLINE_ADD()     GENERATE_ADD(static __inline__, inline_)

int add(int, int);

This doesn't look nice, but cuts the work of maintaining two different functions. The function is fully defined within the GENERATE_ADD(type,prefix) macro, so if you ever need to change the function, you change this macro and everything else changes.

Next, DEFINE_ADD() will be called from add.c to generate the normal version of add. DEFINE_INLINE_ADD() will give you access to a function called inline_add, which has the same signature as your normal addfunction, but it has a different name (the inline_ prefix).

Note: I didn't use the __attribute((always_inline))__ when using the -O3 flag - the __inline__ did the job. However, if you don't wanna use -O3, use:

#define DEFINE_INLINE_ADD()     GENERATE_ADD(static __inline__ __attribute__((always_inline)), inline_)

add.c

#include "add.h"

DEFINE_ADD()

Simple call to the DEFINE_ADD() macro generator. This will declare the normal version of the function (the one that won't get inlined).

loop.c

#include <stdio.h>
#include "add.h"

DEFINE_INLINE_ADD()

int loop(void)
{

    register int i;

    for (i = 0; i < 100000; i++)
        printf("%d\n", inline_add(i + 1, i + 2));

    return 0;
}

Here in loop.c you can see the call to DEFINE_INLINE_ADD(). This gives this function access to the inline_add function. When you compile, all inline_add function will be inlined.

loop2.c

#include <stdio.h>
#include "add.h"

int loop2(void)
{
    register int i;

    for (i = 0; i < 100000; i++)
        printf("%d\n", add(i + 1, i + 2));

    return 0;
}

This is to show you can use the normal version of add normally from other files.

loop3.c

#include <stdio.h>
#include "add.h"

DEFINE_INLINE_ADD()

int loop3(void)
{

    register int i;

    printf ("add: %d\n", add(2,3));
    printf ("add: %d\n", add(4,5));
    for (i = 0; i < 100000; i++)
        printf("%d\n", inline_add(i + 1, i + 2));

    return 0;
}

This is to show that you can use both the functions in the same compilation unit, yet one of the functions will be inlined, and the other wont (see GDB disass bellow for details).

loops.h

/* prototypes for main */
int loop (void);
int loop2 (void);
int loop3 (void);

main.c

#include <stdio.h>
#include <stdlib.h>
#include "add.h"
#include "loops.h"

int main(void)
{
    printf("%d\n", add(1,2));
    printf("%d\n", add(2,3));

    loop();
    loop2();
    loop3();
    return 0;
}

Makefile

CC=gcc
CFLAGS=-Wall -pedantic --std=c11

main: add.o loop.o loop2.o loop3.o main.o
    ${CC} -o $@ $^ ${CFLAGS}

add.o: add.c 
    ${CC} -c $^ ${CFLAGS}

loop.o: loop.c
    ${CC} -c $^ -O3 ${CFLAGS}

loop2.o: loop2.c 
    ${CC} -c $^ ${CFLAGS}

loop3.o: loop3.c
    ${CC} -c $^ -O3 ${CFLAGS}

If you use the __attribute__((always_inline)) you can change the Makefile to:

CC=gcc
CFLAGS=-Wall -pedantic --std=c11

main: add.o loop.o loop2.o loop3.o main.o
    ${CC} -o $@ $^ ${CFLAGS}

%.o: %.c
    ${CC} -c $^ ${CFLAGS}

Compilation

$ make
gcc -c add.c -Wall -pedantic --std=c11
gcc -c loop.c -O3 -Wall -pedantic --std=c11
gcc -c loop2.c -Wall -pedantic --std=c11
gcc -c loop3.c -O3 -Wall -pedantic --std=c11
gcc -Wall -pedantic --std=c11   -c -o main.o main.c
gcc -o main add.o loop.o loop2.o loop3.o main.o -Wall -pedantic --std=c11

Disassembly

$ gdb main
(gdb) disass add

   0x000000000040059d <+0>: push   %rbp
   0x000000000040059e <+1>: mov    %rsp,%rbp
   0x00000000004005a1 <+4>: mov    %edi,-0x4(%rbp)
   0x00000000004005a4 <+7>: mov    %esi,-0x8(%rbp)
   0x00000000004005a7 <+10>:mov    -0x8(%rbp),%eax
   0x00000000004005aa <+13>:mov    -0x4(%rbp),%edx
   0x00000000004005ad <+16>:add    %edx,%eax
   0x00000000004005af <+18>:pop    %rbp
   0x00000000004005b0 <+19>:retq   

(gdb) disass loop

   0x00000000004005c0 <+0>: push   %rbx
   0x00000000004005c1 <+1>: mov    $0x3,%ebx
   0x00000000004005c6 <+6>: nopw   %cs:0x0(%rax,%rax,1)
   0x00000000004005d0 <+16>:mov    %ebx,%edx
   0x00000000004005d2 <+18>:xor    %eax,%eax
   0x00000000004005d4 <+20>:mov    $0x40079d,%esi
   0x00000000004005d9 <+25>:mov    $0x1,%edi
   0x00000000004005de <+30>:add    $0x2,%ebx
   0x00000000004005e1 <+33>:callq  0x4004a0 <__printf_chk@plt>
   0x00000000004005e6 <+38>:cmp    $0x30d43,%ebx
   0x00000000004005ec <+44>:jne    0x4005d0 <loop+16>
   0x00000000004005ee <+46>:xor    %eax,%eax
   0x00000000004005f0 <+48>:pop    %rbx
   0x00000000004005f1 <+49>:retq   

(gdb) disass loop2

   0x00000000004005f2 <+0>: push   %rbp
   0x00000000004005f3 <+1>: mov    %rsp,%rbp
   0x00000000004005f6 <+4>: push   %rbx
   0x00000000004005f7 <+5>: sub    $0x8,%rsp
   0x00000000004005fb <+9>: mov    $0x0,%ebx
   0x0000000000400600 <+14>:jmp    0x400625 <loop2+51>
   0x0000000000400602 <+16>:lea    0x2(%rbx),%edx
   0x0000000000400605 <+19>:lea    0x1(%rbx),%eax
   0x0000000000400608 <+22>:mov    %edx,%esi
   0x000000000040060a <+24>:mov    %eax,%edi
   0x000000000040060c <+26>:callq  0x40059d <add>
   0x0000000000400611 <+31>:mov    %eax,%esi
   0x0000000000400613 <+33>:mov    $0x400794,%edi
   0x0000000000400618 <+38>:mov    $0x0,%eax
   0x000000000040061d <+43>:callq  0x400470 <printf@plt>
   0x0000000000400622 <+48>:add    $0x1,%ebx
   0x0000000000400625 <+51>:cmp    $0x1869f,%ebx
   0x000000000040062b <+57>:jle    0x400602 <loop2+16>
   0x000000000040062d <+59>:mov    $0x0,%eax
   0x0000000000400632 <+64>:add    $0x8,%rsp
   0x0000000000400636 <+68>:pop    %rbx
   0x0000000000400637 <+69>:pop    %rbp
   0x0000000000400638 <+70>:retq   

(gdb) disass loop3

   0x0000000000400640 <+0>: push   %rbx
   0x0000000000400641 <+1>: mov    $0x3,%esi
   0x0000000000400646 <+6>: mov    $0x2,%edi
   0x000000000040064b <+11>:mov    $0x3,%ebx
   0x0000000000400650 <+16>:callq  0x40059d <add>
   0x0000000000400655 <+21>:mov    $0x400798,%esi
   0x000000000040065a <+26>:mov    %eax,%edx
   0x000000000040065c <+28>:mov    $0x1,%edi
   0x0000000000400661 <+33>:xor    %eax,%eax
   0x0000000000400663 <+35>:callq  0x4004a0 <__printf_chk@plt>
   0x0000000000400668 <+40>:mov    $0x5,%esi
   0x000000000040066d <+45>:mov    $0x4,%edi
   0x0000000000400672 <+50>:callq  0x40059d <add>
   0x0000000000400677 <+55>:mov    $0x400798,%esi
   0x000000000040067c <+60>:mov    %eax,%edx
   0x000000000040067e <+62>:mov    $0x1,%edi
   0x0000000000400683 <+67>:xor    %eax,%eax
   0x0000000000400685 <+69>:callq  0x4004a0 <__printf_chk@plt>
   0x000000000040068a <+74>:nopw   0x0(%rax,%rax,1)
   0x0000000000400690 <+80>:mov    %ebx,%edx
   0x0000000000400692 <+82>:xor    %eax,%eax
   0x0000000000400694 <+84>:mov    $0x40079d,%esi
   0x0000000000400699 <+89>:mov    $0x1,%edi
   0x000000000040069e <+94>:add    $0x2,%ebx
   0x00000000004006a1 <+97>:callq  0x4004a0 <__printf_chk@plt>
   0x00000000004006a6 <+102>:cmp    $0x30d43,%ebx
   0x00000000004006ac <+108>:jne    0x400690 <loop3+80>
   0x00000000004006ae <+110>:xor    %eax,%eax
   0x00000000004006b0 <+112>:pop    %rbx
   0x00000000004006b1 <+113>:retq   

Symbol table

$ objdump -t main | grep add
0000000000000000 l    df *ABS*  0000000000000000              add.c
000000000040059d g     F .text  0000000000000014              add

$ objdump -t main | grep loop
0000000000000000 l    df *ABS*  0000000000000000              loop.c
0000000000000000 l    df *ABS*  0000000000000000              loop2.c
0000000000000000 l    df *ABS*  0000000000000000              loop3.c
00000000004005c0 g     F .text  0000000000000032              loop
00000000004005f2 g     F .text  0000000000000047              loop2
0000000000400640 g     F .text  0000000000000072              loop3

$ objdump -t main | grep main
main:     file format elf64-x86-64
0000000000000000 l    df *ABS*  0000000000000000              main.c
0000000000000000       F *UND*  0000000000000000              __libc_start_main@@GLIBC_2.2.5
00000000004006b2 g     F .text  000000000000005a              main

$ objdump -t main | grep inline
$

Well, that's it. After 3 hours of banging my head in the keyboard trying to figure it out, this was the best I could come up with. Feel free to point any errors, I'll really appreciate it. I got really interested in this particular inline one function call.

Enzo Ferber
  • 3,029
  • 1
  • 14
  • 24
  • 1
    This is...fascinating. I asked this question a while ago, before I really understood much about C programming (and now it's fairly difficult for me to find cases in which I really need to inline only certain function calls). Even so, this technique is very cool, and now I actually want to find a use case just so that I can utilize this. Thank you for your effort! – fouric Sep 18 '15 at 18:08
  • @InkBlend Glad I could help! – Enzo Ferber Sep 18 '15 at 18:10
  • Why `gcc -c loop2.c -Wall -pedantic --std=c11`? I suspect if you turn on -O3 then it will inline your add anyway. I don't understand what question this is answering. What is the problem of having two functions to maintain? In this case if you turn the optimiser up it'll inline both won't it, so why both with the macro? – andygavin Sep 18 '15 at 23:40
  • 1
    @andygavin Why the -1? The question was: "how to inline **one** function call". That was answered. If you turn `O3` you'll pass most of your control to the compiler, and this was not what he wanted. Of couse, O3 will inline most functions.... This is hack. **Understanding before standing** is a good approach... – Enzo Ferber Sep 19 '15 at 02:59
  • 1
    @andygavin If you turn `O3` with a simple function like this, it'll always get inlined. The purpose was showing how it can be done. If you never want to use `O3`, just use `__attribute__((always_inline))` as I said in the answer. – Enzo Ferber Sep 19 '15 at 03:02
  • Yes I realise that's the case, but it's not what you've said as your answer. I think your answer is misleading and confused. In fact the comment above if a clearer answer. That is to not turn on the optimiser and to mark the inline method with always_inline. Why all the noise about an inline macro that does nothing. Why not just inline the procedure with the macro in the first place, if you are going to use a macro in this way? – andygavin Sep 19 '15 at 13:52
  • @andygavin I thought of making a macro with the function body, but I dropped the idea for two reasons; 1) that's macro substitution, a **preprocessor** thing, and **inlining** is a compiler thing - I wanted to be congruent with the question and 2) that approach also has the downside of variable names - with larger functions you need more variables than only function parameters. – Enzo Ferber Sep 19 '15 at 21:04
  • @InkBlend Did you find any application for this yet? – Enzo Ferber Sep 22 '15 at 19:00
  • 1
    @EnzoFerber, (un)fortunately, I recently discovered the joy of Lisp, and have had little time to code in C ever since. If I ever return to C and find a use for this, I'll let you know, but for now all I can think of are `let`s and `lambda`s... – fouric Sep 24 '15 at 16:01
3

If you do not mind having two names for the same function, you could create a small wrapper around your function to "block" the always_inline attribute from affecting every call. In my example, loop_inlined would be the name you would use in performance-critical sections, while the plain loop would be used everywhere else.

inline.h

#include <stdlib.h>

static inline int loop_inlined() __attribute__((always_inline));
int loop();

static inline int loop_inlined() {
    int n = 0, i;
    for(i = 0; i < 10000; i++) 
        n += rand();
    return n;
}

inline.c

#include "inline.h"

int loop() {
    return loop_inlined();
}

main.c

#include "inline.h"
#include <stdio.h>

int main(int argc, char *argv[]) {
    printf("%d\n", loop_inlined());
    printf("%d\n", loop());
    return 0;
}

This works regardless of the optimization level. Compiling with gcc inline.c main.c on Intel gives:

4011e6:       c7 44 24 18 00 00 00    movl   $0x0,0x18(%esp)
4011ed:       00
4011ee:       eb 0e                   jmp    4011fe <_main+0x2e>
4011f0:       e8 5b 00 00 00          call   401250 <_rand>
4011f5:       01 44 24 1c             add    %eax,0x1c(%esp)
4011f9:       83 44 24 18 01          addl   $0x1,0x18(%esp)
4011fe:       81 7c 24 18 0f 27 00    cmpl   $0x270f,0x18(%esp)
401205:       00
401206:       7e e8                   jle    4011f0 <_main+0x20>
401208:       8b 44 24 1c             mov    0x1c(%esp),%eax
40120c:       89 44 24 04             mov    %eax,0x4(%esp)
401210:       c7 04 24 60 30 40 00    movl   $0x403060,(%esp)
401217:       e8 2c 00 00 00          call   401248 <_printf>
40121c:       e8 7f ff ff ff          call   4011a0 <_loop>
401221:       89 44 24 04             mov    %eax,0x4(%esp)
401225:       c7 04 24 60 30 40 00    movl   $0x403060,(%esp)
40122c:       e8 17 00 00 00          call   401248 <_printf>

The first 7 instructions are the inlined call, and the regular call happens 5 instructions later.

blakeh
  • 101
  • 4
1

Here's a suggestion, write the body of the code in a separate header file. Include the header file in place where it has to be inline and into a body in a C file for other calls.

void demo(void)
{
#include myBody.h
}

importantloop
{
    // code
#include myBody.h
    // code
}
Ant
  • 1,668
  • 2
  • 18
  • 35
-2

I assume that your function is a little one since you want to inline it, if so why don't you write it in asm?

As for inlining only a specific call to a function I don't think there exists something to do this task for you. Once a function is declared as inline and if the compiler will inline it for you it will do it everywhere it sees a call to that function.