Why can forced inline functions lead to bad performance?

Question

If I inline a function. The function call body will be copied instead of issuing a call() to it. Why can that lead to bad performance?

Edit: And what about cache misses because of to big functions then? Why does the rule of thumb "only inline functions with max 3 lines" exist then?

If you inline lots of functions, it may lead to excessively large object code. But not sure when will this affect the performance. — vsoftco, Jan 14 '15 at 17:31
http://stackoverflow.com/questions/145838/benefits-of-inline-functions-in-c — Cory Kramer, Jan 14 '15 at 17:31
boo, hiss re: adding more subquestions to a question after it already has answers. — Charles Duffy, Jan 14 '15 at 17:50

score 7 · Accepted Answer · answered Jan 14 '15 at 17:39

7

There may be an edge case where inlining a function can increase the program size or move bits of the program around so that cache misses occur where they didn't before. It wouldn't be common, since caches are designed to handle most common situations and are quite large compared to most hotspots.

answered Jan 14 '15 at 17:39

Mark Ransom

299,747
42
398
622

Is that such an edge case for game consoles too? – simonides Jan 14 '15 at 17:41
1

@fridolin69 it all depends on the processor used in the console. All processors have had quite similar cache systems for many years now, so I can't say it would make a big difference. Do you have a specific situation that you're wondering about? – Mark Ransom Jan 14 '15 at 17:44
No specific console I just stumbeld upon it while preparing for an exam. – simonides Jan 14 '15 at 17:45

mbgda · Answer 2 · 2015-01-14T17:48:23.360

4

There's no standard way to force inline functions in modern C++ compilers, so this is kind of a moot point. However, assuming you are using compiler-specific functionality to force inline (and the compiler doesn't ignore it) it wouldn't lead to bad performance but it would lead to increased executable size due to there being more copies of the same code.

Edit: Per the comment below it should be mentioned that a very unlikely edge case does exist where your code could be executing different copies of the same inlined function in close proximity, reducing the efficiency of the instruction cache. The likelihood that this will measurably affect performance is extremely small, but in certain edge cases it could.

edited Jan 14 '15 at 17:48

answered Jan 14 '15 at 17:31

mbgda

787
5
8

2

You can't with standard cpp but you can with compiler specific keywords like `__forceinline` in VS – simonides Jan 14 '15 at 17:33
@fridolin69 Good point on the compiler specific inlines. I'll edit my answer to explicitly state there's not *standard* way, which is what I intended. – mbgda Jan 14 '15 at 17:34
2

@fridolin69 The compiler may still ignore the `__forceinline` keyword if it needs to (["You cannot force the compiler to inline a particular function, even with the __forceinline keyword"](http://msdn.microsoft.com/en-us/library/z8y1yy88.aspx)). – ssube Jan 14 '15 at 17:35
not sure why so many downvotes, the answer seems reasonable – vsoftco Jan 14 '15 at 17:35
@vsoftco I'm not totally sure either. If I said something that's completely wrong I wish someone would comment so that I (and other readers) can learn something. – mbgda Jan 14 '15 at 17:40
1

Your answer is almost there but you didn't answer how it can lead to bad performance, which it can. Because the function is copied it makes code bigger, which hurts the CPU's instruction cache. The i-cache is not all that big and the more code it has to fetch, the slower it goes. – Zan Lynx Jan 14 '15 at 17:41
@ZanLynx I suppose it *can* lead to bad performance, but it would be a very specific edge case of calling different versions of the same inlined function in extremely close proximity. And I would think in that case it's likely the compiler would just ignore the inline suggestion. – mbgda Jan 14 '15 at 17:42

score 3 · Answer 3 · answered Jan 14 '15 at 18:04

We should take a step back and try to explain how CPUs work. Usually they have different caches, one for the code, which tells the CPU the instructions that will be needed to execute, and one for data, where operations are applied to.

Data cache misses are "easy" to solve, try to use the smallest data structures you can, put close together members that you access more frequently...

Instruction cache misses are more difficult to understand and solve, and that's also the reason why it's commonly recognized that polymorphic behavior in C++ is slower than normal function calls. Basically the CPU will prefetch in its caches the instructions that are stored close to the execution point you're trying to execute, if everything is inline, there's just more data and it won't be able to prefetch everything, leading to a cache miss. Please note this is just a simplistic case, in my experience I had problems with template instantiations that would generate a lot of code, leading to a slower performance than just having simple virtual calls and a not too deep object hierarchy.

As Alexandrescu always points out, you should always time your code

Source: What Every Programmer Should Know About Memory

Why can forced inline functions lead to bad performance?

3 Answers3