Can function overhead slow down a program by a factor of 50x?

Question

I have a code that I'm running for a project. It is O(N^2), where N is 200 for my case. There is an algorithm that turns this O(N^2) to O(N logN). This means that, with this new algorithm, it should be ~100 times faster. However, I'm only getting a factor of 2-fold increase (aka 2x faster).

I'm trying to narrow down things to see if I messed something up, or whether it's something inherent to the way I coded this program. For starters, I have a lot of function overhead within nested classes. For example, I have a lot of this (within many loops):

energy = globals->pair_style->LJ->energy();

Since I'm getting the right results when it comes to actual data, just wrong speed increase, I'm wondering if function overhead can actually cause that much speed decrease, by as much as 50-fold.

Thanks!

Try running both under a profiler, and see what is taking the time in each. — Mankarse, Oct 12 '11 at 13:00
Big-O notation shows how something scales, not necessarily how it performs. For small values of N you might not see any difference at all. — Joe, Oct 12 '11 at 13:00
"It is O(N^2), where N is 200 for my case". There you have it. You're not using big-Oh notation correctly. If "N is 200 for [your] case", then it is O(1). — R. Martinho Fernandes, Oct 12 '11 at 13:00
Oh man. Firstly, I apologize for confusing performance with the Big-O notation. Secondly, @Mankarse: Would you mine suggesting a profiler? Thirdly, I really really hope it's not function overhead, else I would have to jam everything into one, making it very very ugly and very work-intensive — Amit, Oct 12 '11 at 13:05
@Amit: On OSX I like to use the profiler that is built into `Activity Monitor` (This is the only profiler that I have used extensively). On windows I have had good results with [Very Sleepy](http://www.codersnotes.com/sleepy). I have not done much profiling on other systems, but have a look [here](http://stackoverflow.com/q/375913/485561) for some more suggestions. — Mankarse, Oct 12 '11 at 13:11
Thanks everyone. I'll probably resort to `valgrind` for now, and most likely mess around with the free-trial of vTune a bit later. Learned a lot this morning :) — Amit, Oct 12 '11 at 13:27

NPE · Accepted Answer · 2011-10-12T14:04:36.120

Firstly, your interpretation that O(N logN) is ~100 times faster than O(N^2) for N=200 is incorrect. The big-Oh notation deals with upper bounds and behaviour in the limit, and doesn't account for any multiplicative constants in the complexity.

Secondly, yes, on modern hardware function calls tend to be relatively expensive due to pipeline disruption. To find out how big a factor this is in your case, you'd have to come up with some microbenchmarks.

score 4 · Answer 2 · answered Oct 12 '11 at 13:00

The absoloute biggest hit is cache misses. An L1 cache miss is relatively cheap but when you miss on L2 (or L3 if you have it) you may be losing hundreds or even thousands of cycles to the incoming stall.

Thing is though this may only be part of the problem. Do not optimise your code until you have profiled it. Identify the slow areas and then figure out WHY they are slow. Once you have an understanding of why its running slowly you have a good chance to optimise it.

As an aside, O notation is very handy but is not the be all and end all. I've seen O(n^2) algorithms work significantly faster than O(n log n) for small amounts fo data (and small may mean less than several thousand) due to the fact that they cache far more effectively.

@Amit: vTune is probably the best there is for intel x86 chips but its not cheap. The 30 day free trial, however, is pretty cheap ;) — Goz, Oct 12 '11 at 13:19

score 1 · Answer 3 · answered Oct 12 '11 at 13:04

1

The important thing about Big O notation is that it only specifies the limit of the execution time, as the data set size increases - any constants are thrown away. While O(N^2) is indeed slower than O(N log N), the actual run times might be N^2 vs. 1000N log N - that is, an O(N^2) can be faster than O(N log N) on some data sets.

Without more details, it's hard to say more - yes, function calls do indeed have a fair amount of overhead, and that might be why you're not seeing a bigger increase in performance - or it might just be the case that your O(N log N) doesn't perform quite as well on a data set of your size.

answered Oct 12 '11 at 13:04

Michael Madsen

54,231
8
72
83

This is really interesting. I was not aware of this! Are there possible compile flags that can somewhat reduce function overhead? I'm currently compiling with -O3 and using intel's `icpc` compiler – Amit Oct 12 '11 at 13:09
1

@Amit: The only way I'm aware of to reduce function overhead is to avoid it entirely through inlining - and that *can* indeed make a huge difference. Again, it's impossible to know without more details; so run it through a profiler if necessary. – Michael Madsen Oct 12 '11 at 13:21

score 1 · Answer 4 · answered Oct 12 '11 at 13:08

I've worked on image processing algorithms, and calling a function per pixel (ie: for 640x480 would be 307200) can significanly reduce performance. Try declaring your function inline, or making the function a macro. This can quickly show you if it is because of function calls. Try looking at some profiling tools. VS 2010 comes with some nice tools, or else there is also Intel VTune, glowcode. They can help show where you are spending time.

IMHO I don't think that 1600 function calls should reduce performance much at all (200 log 200)

score 1 · Answer 5 · edited May 23 '17 at 12:13

1

I suggest profiling it using

The big FAQ topic on profiling is here: How can I profile C++ code running in Linux?

gprof (requires compiletime instrumentation)
valgrind --tool=callgrind and kcachegrind; excellent tool with excellent visualization - screenshots here:

enter image description here

edited May 23 '17 at 12:13

Community

1
1

answered Oct 12 '11 at 13:13

sehe

374,641
47
450
633

Can function overhead slow down a program by a factor of 50x?

5 Answers5