Are the Optimization Keywords in C and C++ Reasonable?

Question

So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.

register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.

But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?

I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?

Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?

EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...

I don't know much about C, but in C++, `inline` is rather used to circumvent the ODR. — dyp, Apr 12 '13 at 11:47
"Where should the line should be drawn" - I think it should always be ~~benchmarked~~ disassembled. Choose the one which produces the better result. — , Apr 12 '13 at 11:47
You really should distinguish C and C++, here, they are not the same with all three keywords that you mention: `inline` and `register` have different semantics and `restrict` doesn't even exist in C++. — Jens Gustedt, Apr 12 '13 at 11:51
The fact `register` was deprecated in C++11 is fairly telling. There's no benefit with modern optimizing compilers. GCC however, uses it in a non-standard [extension](http://gcc.gnu.org/onlinedocs/gcc/Explicit-Reg-Vars.html#Explicit-Reg-Vars) — Brett Hale, Apr 12 '13 at 12:20

NPE · Accepted Answer · 2013-04-12T11:55:23.657

9

I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.

restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.

edited Apr 12 '13 at 11:55

answered Apr 12 '13 at 11:48

NPE

486,780
108
951
1,012

i would argue that `register` and `restrict` are similar in that they both tell the compiler what you allow it to assume. `register` is better in that it also allows the compiler to enforce. but of the same kind. oh, wait a minute... this applies also to `inline`. – Cheers and hth. - Alf Apr 12 '13 at 11:53
Compilers today can normally see all of the call sites of a function, at least if you use the maximum optimization algorithms. It's true, however, that the locality affected by `register` is much smaller, and more importantly, that the algorithms for optimizing register allocation are now well known and univerally used, which is not the case for the optimizations related to `inline` or `restrict`. – James Kanze Apr 12 '13 at 12:01
2

@JamesKanze: Out of interest, what are the mechanics of the compiler being able to see all the call sites? Say I build a library containing a function named `foo()`, and give the `.a` file to you. You write a function called `bar()` which invokes `foo()`. How can the compiler know anything about `bar()` when compiling `foo()`? – NPE Apr 12 '13 at 12:06
@NPE The usual solution, I think, is for the compiler to generate annotated byte code, similar to what a Java compiler generates, and then only compile it to machine code in the link phase. See `-lto` for g++, `/GL` for VC++. – James Kanze Apr 12 '13 at 12:42
@NPE can the compiler assume `restrict` semantics on unique_ptr? – soandos Apr 12 '13 at 14:19
@JamesKanze: `-lto` or `/GL` allow to optimize on compilation unit granularity - not true whole program. So yes - the static libraries would be considered IF they were compiled with appropriate flags but dynamic libraries are also possible (plugins, shared libraries etc.) in which case even if they would be compiled in such way they might be replaced in a future with say a bugfix. – Maciej Piechotka Apr 12 '13 at 19:58
@MaciejPiechotka That's not what the documentation for them says. Obviously, you can't optimize across DLL boundaries, but then, why are you using DLLs? There are very few cases where a DLL is justified. – James Kanze Apr 13 '13 at 11:24
@JamesKanze: I was saying about DLL. I'm using Linux when the situation is reversed - the shared libraries installed globally are the default. And there is a few reason - for example pushing DLL with fixed bug is small update done in one place. If the libraries are distributed with program then you need to update all of them (update every program). If you link statically then you need to recompile every program or it will not be patched. They save space (you need only one copy of say zlib, gtk+ or qt per system) which is not so important for disk but allows more efficient caching in memory. – Maciej Piechotka Apr 13 '13 at 15:23
In addition on Windows the DLLs pose problems for C++ - as the MSVC have no guarantee of ABI either you need to be extra careful about methods you are using (no STL etc.) or you need to use the same compiler for all libraries. I misspoke about compilation unit - I meant assembly (dynamic library or program). – Maciej Piechotka Apr 13 '13 at 15:25
@MaciejPiechotka For system functions, shared libraries should always be the default. But system functions aren't subject to optimization anyway. For anything that is not system or pseudo-system (e.g. something like a database), you should avoid shared libraries. For anything else, shared libraries are simply a way of ensuring that your clients run versions you haven't tested, which might not work with your software. – James Kanze Apr 13 '13 at 20:06
@MaciejPiechotka Regarding your last comment: I agree. For a suitable definition of assembly. But that is, or should be, more or less irrelevant. Different assemblies only interface at higher levels; the performance issues are in the individual assemblies. – James Kanze Apr 13 '13 at 20:08
@JamesKanze: If change of version breaks software than there is bug somewhere - on the other hand on Linux there is central place (distribution) which goal is to check if everything is working. I'm afraid that we have to agree to disagree (different philosophies regarding the management of libraries). I disagree about the optimization of system libraries - `memcpy` etc. are usually highly optimized routines (often manually) so there is point in using them. And if pointer *do* come across boundary then even if performance issues are in assembly you won't be able to perform alias analysis. – Maciej Piechotka Apr 13 '13 at 20:53

score 5 · Answer 2 · answered Apr 12 '13 at 11:49

5

inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.

This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.

There are GNU flags I think force-inline or similar but that is a language extension.

answered Apr 12 '13 at 11:49

CashCow

30,981
5
61
92

`inline` _can_ be used to make a library header-only. Being header-only is a serious disadvantage in general, however; it allows people to use your library on machines you've never heard of, much less tested it on. – James Kanze Apr 12 '13 at 11:58
2

The gnu way of forcing inlining is actually the function attribute "always_inline". – Étienne Apr 12 '13 at 12:03

score 5 · Answer 3 · answered Apr 12 '13 at 11:56

5

register doesn't even say that you can't reference the variable indirectly (at least in C++). It said that in the original C, but that has been dropped.

Whether trying to out-optimize the compiler is a fool's errand depends on the optimization. Not many compilers, for example, will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.

Today, most compilers ignore register, and no one uses it, because compilers have become good enough at register allocation to do a better job than you can with register. In fact, respecting register would typically make the generated code slower. This is not the case for inline or restrict: in both cases, there exist techniques, at least theoretically, which could result in the compiler doing a better job than you can. Such techniques are not widespread, however, and (as far as I know, at least), have a very high compile time overhead, with in some cases compile times which grow exponentially with the size of the program (which makes them more or less unusable on most real programs—compile times which are measured in years really aren't acceptable).

As to where to draw the line... it changes in time. When I first started programming in C, register made a significant difference, and was widely used. Today, no. I imagine that in time, the same may happen with inline or restrict—some experimental compilers are very close with inline already.

answered Apr 12 '13 at 11:56

James Kanze

150,581
18
184
329

When you started programming in C, possibly you were writing code very close to the system and were writing in a single-process environment where your program was the only thing running on your DOS PC. – CashCow Apr 12 '13 at 11:59
+1 but i think what we need are types with added semantic restrictions in order to express "i want standard functionality, don't optimize if that changes the semantics". mostly for g++ though. a special floating point type that g++ can't foul up, and where the numeric limits info is always reliable, and special integer types that it can't foul (on assumption of no wrapping) no matter `-fwrapv` or not. – Cheers and hth. - Alf Apr 12 '13 at 11:59
@CashCow When I started programming in C, the compiler only had 64 KB of RAM to play around in, and the disk it would spill to was a floppy. Optimization strategies which take maybe 10 ms today could take minutes. And not all of them were known: Sethi-Ullman was state of the art, and register coloring had just been published. – James Kanze Apr 12 '13 at 12:11
Regarding the `register` keyword, it depends on the compiler. GCC will honor it if you don't turn on any optimizations - which can in fact make the code much faster. – rm5248 Apr 12 '13 at 18:58
2

As side note `sin(x) * sin(x) + cos(x) * cos(x)` CANNOT be optimized into `1` as while identity holds for real numbers it is not the case for IEEE 754 floating point number. For example `sin(nan) * sin(nan) + cos(nan) * cos(nan) = nan`(similarly for `inf` result is `nan`). Similarly the original formula might have rounding errors (for 32 bit float the result of `(sin(x) * sin(x) + cos(x) * cos(x)) - 1` is `-5.9604645e-8`). – Maciej Piechotka Apr 12 '13 at 19:26

score 5 · Answer 4 · answered Apr 12 '13 at 11:58

5

This is a flame-bait question but I will dive in anyway.

Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.

I see inline as just as bad as register.

In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.

In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.

answered Apr 12 '13 at 11:58

Gordon Milne

51
4

Measure first before you modify (and again after) is a given. Still, with most current compilers, judicious use of `inline` can make significant improvement (once you've determined that the improvement is needed), with very little effort. And I've seen a couple of cases where `restrict` could have brought an order of magnitude improvement in a small, critical function (which could take up to 20 minutes to execute). – James Kanze Apr 12 '13 at 12:03
With regards to your last paragraph---there does seem to be a tendancy to abuse `inline`. Premature optimization is premature optimization, and the original version of a program should not contain any `inline` functions. But that doesn't mean that when optimization is necessary, `inline` is a bad tool. It's one of the cheapest tools (in terms of human effort and cost to code readability) you can use. The key is to only use it when necessary. – James Kanze Apr 12 '13 at 12:06
1

`inline` semantics are useful in C99, since an implementation of the function can be instantiated in a translation unit. If the compiler decides *not* to inline the code, it doesn't have to implement statically scoped implementations in multiple translation units. This is useful for function pointers too. – Brett Hale Apr 12 '13 at 12:08
Isn't the point of inline that it really does not force the compiler to do anything? It is merely a hint, unlike what register was, and so, just like the register keyword, may become useless. Regarding the point about size, try using -Osize, rather than just removing all the inline keywords – soandos Apr 12 '13 at 14:23

score 2 · Answer 5 · answered Apr 12 '13 at 13:47

It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)

Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.

So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.

This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.

Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.

score 2 · Answer 6 · edited May 23 '17 at 11:49

One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.

For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.

For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.

Maciej Piechotka · Answer 7 · 2013-04-12T20:00:07.767

The difference is as follows:

register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.

Consider very simple code:

void my_memcpy(void *dst, const void *src, size_t size) {
    for (size_t i = 0; i < size; i++) {
        ((char *)dst)[i] = ((const char *)str)[i];
    }
}

Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.

restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.

So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.

PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.

score 0 · Answer 8 · answered Apr 14 '13 at 18:57

From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.

Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.

The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.

Are the Optimization Keywords in C and C++ Reasonable?

8 Answers8