C function code length VS processor cache

Question

As C is procedure oriented language, while working with C, I always end up with sequential code, running from top to bottom as one or few C functions.

Sometime, I code functions of 1000 lines. Because I think function calls has overhead. While this doesn't duplicate code, I can say I duplicate less than 5% code in long functions.

So, what are effects of long functions over processor cache? Will long functions prevent from better CPU cache usage? Does CPU caches works like caching whole C function? If processor cache doesn't like long functions, then will it be more efficient to have function calls?

Yes, function calls has some overhead. However, that overhead has been so small that is barely measurable for the last 25 years or so. And the effects of long function on the cache is probably only going to be noticeable if you have loops with lots of code in them. My tip? Readability FTW! Write code that others will easily read and understand, when when I say "others" I mean you too in a few months time. — Some programmer dude, Nov 07 '14 at 14:19
But does functions affect processor cache? If long functions are not better for processor cache, then is short functions calls better than CPU cache? — brb tea, Nov 07 '14 at 14:22
Yes correct, I end up with lot of code inside loops like of 100 lines inside loop. — brb tea, Nov 07 '14 at 14:24
I'm not going to say that it's unusual to have that long functions, but due to the readability (and maintainability!) reasons most people avoid writing such long functions. Rule of thumb: If it can fit in a "page" in your editor it's long enough. — Some programmer dude, Nov 07 '14 at 14:24
This doesn't remove readability, because I always try to not nest more than three loops. And most of time, code is sequential. Actually, that helps me read code better in IDEs. — brb tea, Nov 07 '14 at 14:25
But having long function *do* hinder readability! Think about when you're looking at one part of a function, and you need to see something in another part (like a variable declaration what type an argument was), then you have to scroll back and forth, trying to find what you're looking for, and making you loose pace and concentration in your coding. If you can see everything in a single "page" then it's easy to just move your eyes only, and keep your hands on the keyboard and keep on writing. — Some programmer dude, Nov 07 '14 at 14:27
LOL, ok that prevents me from readability and maintainibility, ok agreed that. But my question is does long functions affect CPU cache? — brb tea, Nov 07 '14 at 14:30
LMAO, I developed application, consisting of 3 major threads, each having more than 1000 lines of code as thread execution function., That is I have 3 functions of total more than 3000 lines, each thread working on each function, :P — brb tea, Nov 07 '14 at 14:37
Read this (it's labelled C++ but almost all of it applies to C as well): http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code?rq=1 — Klas Lindbäck, Nov 07 '14 at 14:47

Blagovest Buyukliev · Accepted Answer · 2014-11-07T15:43:21.670

1

Readability should, in general, always come first, and you can pretty much regard this as a "last resort" kind of optimisation which will not buy you a significant performance gain.

Today's CPUs are caching the instructions as well as the data. In general, you should optimise the layout of the data and the memory access patterns, but the way in which instructions are arranged also matters for the utilisation of the instruction cache.

Calling a non-inlined function is in fact an unconditional jump, much like a jmp instruction. This jump makes the CPU start fetching instructions from another (possibly far) location in memory. If this new location isn't found in the instruction cache, the CPU will stall until the corresponding memory is brought there. In theory, if the code contains no jumps and branches, the CPU could prefetch instructions as aggressively as possible.

Also, you never really know how far is "too far". Jumping a few kilobytes forwards or backwards might well be a cache hit, since the usual instruction cache today is about 32 kilobytes.

It's a very tricky optimisation to do right, and I would advise you to look at your data layout and memory access patterns first.

The other concern is the overhead of passing the arguments on the stack or in registers. With today's CPUs this is less of a problem, since the whole stack is usually "hot" in the data cache, and register renaming can even eliminate register-to-register moves to a no-op.

edited Nov 07 '14 at 15:43

answered Nov 07 '14 at 15:05

Blagovest Buyukliev

42,498
14
94
130

Thanks for reply. Ok, so what about conditional jumps like, if, for, while loops? Will CPU stops prefetching instructions once it reach to conditional jumps? – brb tea Nov 07 '14 at 15:11
That's what branch prediction tries to solve. The branch predictor would tell the rest of the CPU what is the most likely outcome, and it can start fetching instructions from that location. It can even execute a certain number of instructions speculatively, reversing their effect when the condition of the branch is known. – Blagovest Buyukliev Nov 07 '14 at 15:14
Ok, so if I use long functions, then I will have lot of local variables in stack. If I use lot of functions, than I will have to either pass variables as parameters to functions or I will use structure object which has all these variables needed, and passing this struct pointer across functions. Which is better? – brb tea Nov 07 '14 at 15:21
what's the meaning of **far** in instruction cache terms? the instruction cache can deal with sort functions (not inlined) making for example a recursive call and no memory access is to be done, as everything is on the cache. Also, far is no meaning here, as cpu can cache instructions long far away. It hasn't to cache the code in between. – Luis Colorado Nov 09 '14 at 10:18
@LuisColorado: It is true that the instruction cache can hold blocks of instructions which can be very far apart and everything will be fine as long as all the blocks fit in the cache, but the *cumulative effect* of many far jumps (in whatever way we define "far") is that it reduces the overall efficiency of the cache. In simple terms, it would mean that yet another *far jump* would eventually need to evict something else from the cache once it gets full. – Blagovest Buyukliev Nov 09 '14 at 11:08

C function code length VS processor cache

1 Answers1