Code execution slowing due to header include

Question

We have a very large program here which mixes C++ and FORTRAN (sorry). One of my check-ins has resulted in a dramatic slowdown of the complete application (i.e. by a factor of two or more) - even in areas of the code which are not affected by my changes.

The facts:

Almost every module has slowed by a similar amount - even ones which use none of my code.
The executable is about 6% bigger.
The metadata has not been changed between check-ins.
The IDE/compiler is VS2010 in Release mode.
Some .lib files have doubled or tripled in size.

I looked at one of the .lib files which has tripled in size, and there are only two changes:

a) I have included a large-ish header file which in turn includes many others - some of which contain moderately complicated inline code. The 'Additional Include Directories' has gone from none-or-one to about 7, as each header file #includes one or more others.

b) I have called 4 functions from out of this header file, but these are not called during the run that has slowed down (i.e. their execution cannot be slowing the code down, but their inclusion might conceivably be).

In spite of searching the forums as to whether including header files slows down execution (as opposed to compilation), I can't find a single relevant article. My questions are:

? Does the #inclusion of any form of header (declaration or inline) slow down the code execution?

? Is there are qualitative or quantitative difference in the inclusion of inline code w.r.t. execution speed (I know that 'inline' is only advice to the compiler)?

? What are the correlations between .lib size, .exe size and execution speed (I'm expecting lots of different and contradictory correlations here)?

? Will refactoring some of the header files such that they don't need to include others (by putting these includes into a .cpp file, and thus reducing my 'Additional Include Directories') improve my situation, do you think?

I guess the last question is the meat of the issue, as it will take a lot of effort...

[Profiling...](http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29) — Some programmer dude, Oct 03 '12 at 13:28
find a good profiler, execute the code, collect the information on the timing of each call, analyse and then decide what to do. it sounds like you are shooting in the dark. — Les, Oct 03 '12 at 13:32
Profiling is not the most efficient path if this is a build-time issue. I'd suggest focusing on the content of the checkin itself if it's really the case that it and nothing else caused this change. — Steve Townsend, Oct 03 '12 at 13:41
Sorry: should have said we've profiled the before and after cases. This is how we know it is not one of my new functions, as these are barely called. All of the existing units of code just run slower... — Mike Sadler, Oct 03 '12 at 14:02

score 5 · Accepted Answer · answered Oct 03 '12 at 13:59

Does the #inclusion of any form of header (declaration or inline) slow down the code execution?

Adding unused declarations nor adding unused inline definitions does not slow down execution. I can however imagine several things that can slow down execution:

Some #define that prevents optimized inline or macro variants of commonly used functions to be provided by another header further down the line.
Overload for some operation that is commonly used, possibly from within standard library, that is less efficient than the default.

Is there are qualitative or quantitative difference in the inclusion of inline code w.r.t. execution speed (I know that 'inline' is only advice to the compiler)?

Well, if the code is not available, it can't be inlined. If it is, that it can. Usually the compiler can estimate how much inlining will save and not inline if it won't help, but occasionally it can guess wrong. In such case the result will be wildly different from the usual case where it slightly helps.

What are the correlations between .lib size, .exe size and execution speed (I'm expecting lots of different and contradictory correlations here)?

Completely different case by case. Inlining inflates the code size, but can save a lot of work on each call site. But larger code takes up more cache, which slows it down.

Will refactoring some of the header files such that they don't need to include others (by putting these includes into a .cpp file, and thus reducing my 'Additional Include Directories') improve my situation, do you think?

It may or may not. Depends on the actual cause.

I propose you should really try to find the cause. It is almost certainly caused by some particular bit of code, not the amount of code included. So go back to revision before the change and add the included bit by bit. First include the innermost headers alone and than add the headers that use them and so on, one by one. When you get to the particular header that makes things worse, try commenting out bits of it until you narrow it down to particular declaration or few of them.

Also take out just some function where you observe the performance degradation. Than if you narrow it down and still don't see what could be wrong, you'll have something on that others can reproduce the issue, so you can use it as new question.

Thanks to all who are helping - I've flagged this as the answer, not necessarily because it has solved my particular problem, but because I think it best addresses the various points for anyone who reads this thread in the future. — Mike Sadler, Oct 03 '12 at 14:11
One clarification on overloading: we have a large solution with many projects. If one project has included a function "void foobar()" from 'foo.lib' for many years, and then another project starts including "void foobar()" from 'bar.lib', with both projects end up using the same foobar() in the final .exe? — Mike Sadler, Oct 03 '12 at 14:14
So if the first project to be linked by the .exe project includes foo.lib, and a later one includes bar.lib, foo.lib->foobar() will be used. This seems like quite a risk - how come the linker doesn't complain about this? It complains about everything else... — Mike Sadler, Oct 03 '12 at 14:28
@MikeSadler: With most unix linkers, the link order will determine which version will get picked. It will be the _first_ one the linker comes across. However MSVC++ should error out and if you tell it to pick the first with `/FORCE` option, it should still give a warning. But if the other project starts defining `void foobar()` inline in a header, than you'll end up with deadly mixture. — Jan Hudec, Oct 03 '12 at 17:41
Thanks, @JanHudec - it's good to get these things straight in my head. — Mike Sadler, Oct 04 '12 at 07:39

score 1 · Answer 2 · answered Oct 03 '12 at 13:29

1

Changing the header files cannot change execution time unless by accident you include something that builds DEBUG or other diagnostic code into the resulting binaries.

That would be my guess esp. given the change in output file size.

answered Oct 03 '12 at 13:29

Steve Townsend

53,498
9
91
140

OK, so if I went crazy and linked in (via chained #includes) 1000 header files which contained vast amounts of complicated inline code which I didn't actually call, would this a) increase the .exe size but have no impact on the speed, or b) not increase the .exe. size *or* increase the speed (because the compiler realises it can ignore it all)? – Mike Sadler Oct 03 '12 at 13:46
#pragma comment(lib...) could pull in lots of code, with a 1000 header files, that could be going on. Size would increase, even if you don't call these functions, other parts of the program may now call them (instead of from a different faster lib, just a guess) – Les Oct 03 '12 at 13:52
you could compile to a preparsed file (-P i think) and then wade through the huge resulting file(s) to see exactly what the preprocessor is doing – Les Oct 03 '12 at 13:54
Now that just doesn't sound like fun! – Mike Sadler Oct 03 '12 at 13:59
We're not using #pragma comment(lib statements in this code - but how would this lead to more code being pulled in? Doesn't this effectively just add to the normal linking? – Mike Sadler Oct 03 '12 at 14:00
The linker will only pull in code that is used somewhere in your binary. You do note that you make calls to code but not in the test you are running now. However I'd be very surprised if the inclusion of uncalled code is resulting in a factor of two slowdown. Are you sure there's not just some bug in your checkin that is slowing things down? I still think some inadvertent change in the build settings is the most likely cause. – Steve Townsend Oct 03 '12 at 14:01
This is another possibility we're working on. One of my first checks was on the VS metadata, to see if any of the library include paths had changed or anything (which they haven't). My colleague has just found a new lead to suggest that the wrong pre-built library is being used - but none of the project data which refers to these has changed... – Mike Sadler Oct 03 '12 at 14:05

Les · Answer 3 · 2012-10-03T14:13:56.723

1

Are you using COM? Did your include file change STA to MTA or vice-versa? Do your include files now pull in a library where before you had dynamic linking (lib pragma)? Does the include not pull in a lib anymore and your code is no longer dynamically linking? I repeat Steve's, is a debug lib being included?

DUMPBIN might provide you with additional insight about what has actually gotten built. Compare the results to the old version to see if anything major stands out.

ADDITIONAL EDIT: Check the memory usage on the test machine, check for paging activity, on the off chance that your larger exe has crossed a threshold.

edited Oct 03 '12 at 14:13

answered Oct 03 '12 at 13:42

Les

10,335
4
40
60

We had already checked the Debug build, and we're fairly sure none of them are building in Debug. We are not using COM or anything exotic ;-) - just VC++ and Intel FORTRAN. I *am* delay-loading a dll, though - although not using it in this case, and it is not associated with most of the code that is running slow. I will investigate DUMPBIN... – Mike Sadler Oct 03 '12 at 13:54

score 1 · Answer 4 · edited May 23 '17 at 12:32

One blind shot :

It could be a cache issue. Inlining functions and adding "dead" code to a library will result in bigger code, and can increase the amount of cache miss during the execution of your program.

You can see if this is the right path by simply monitoring the number of cache misses during the execution of your process.

about your comment :

How much is 6% ?

If you overflow your L1 cache (as far as I know, its size is around 32K even on modern processors), you trade L1 accesses for L2 accesses, which are ~ 2x slower. If you overflow your L2 cache (can range from 256K to 2M) and start accessing L3, you have another 5x slowdown in fetching the pages (you can check this question, which gives figures for a core i7).

Here are general explanations about cache misses on wikipedia.

Once again, to see if this is really the issue, you should monitor the number of cache miss your process hits during its execution (I think process explorer shows you this if you are using windows, or perf if you are using linux)

I guess two questions: 1) Could a 6% increase in .exe size lead to an 50% decrease in speed? (I might guess that a larger .exe might *slightly* reduce speed...). 2) Are there defined .exe size thresholds which might lead to a step-change in performance? (e.g. 32Mb, 64Mb, etc) — Mike Sadler, Oct 03 '12 at 14:26

score 0 · Answer 5 · answered Oct 03 '12 at 13:36

Does the #inclusion of any form of header (declaration or inline) slow down the code execution? Well, if your app has to

That only changes the place where the code appears, so I believe it doesn't change anything. (your code will not become faster/slower if you will just move a function three lines up, will it?)
Is there are qualitative or quantitative difference in the inclusion of inline code w.r.t. execution speed (I know that 'inline' is only advice to the compiler)

Maybe. If you compare an inline and not-inline function, the inline one may be faster, because it's code will just be copypasted in the appropriate place, while the normal one will waste some time on function call.
What are the correlations between .lib size, .exe size and execution speed (I'm expecting lots of different and contradictory correlations here)

While I can imagine a hypothetic situation where a larger file will slow down the speed, I'll risk to say that most of the time there's no correlation.

Your executable is probably larger because maybe you have overridden some macroses that affect execution (like undefining a define that was meant to exclude some code). That may also lead to performance decrease (i.e. you didn't want some code to execute, but due to the accidental macro redefinition it did).

I think that this is one of the reasons I don't like having all of the code in header files - it means you can inadvertently include unpleasant #defines, like the dreaded which redefines "min" and "max". However, this is the only risk you think? — Mike Sadler, Oct 03 '12 at 13:49

score 0 · Answer 6 · edited May 23 '17 at 12:17

Guesswork is unlikely to find the problem. Diagnosis will find it. If the slowdown is a factor of two, that means the slower code is spending 50% of it's time doing something unnecessary that it was not doing before. This will be very easy to find.

This is the method I use, because it does find the problem(s). Here's an example, and here's a bunch of reasons why.

I suggest you first diagnose the problem in un-optimized code. Then when you clean out the problem(s), turn on the optimizer and let it do its thing. Chances are very good that the code contains significant problems that you can fix and the optimizer can't. No matter what the optimizer does, it does not make it any easier to find the problems only you can fix.

Code execution slowing due to header include

6 Answers6