Example of compiler optimizations that can be 'easily' done on C++ code but not C code

Question

This question talks of an optimization of the sort function that cannot be readily achieved in C: Performance of qsort vs std::sort?

Are there more examples of compiler optimizations which would be impossible or at least difficult to achieve in C when compared to C++?

It's not about the optimizations. It is about the abstractions that can easily be expressed **while** still emitting optimal code. — sehe, Apr 09 '12 at 21:22
There's nothing (AFAIK) stopping compiler writers from inlining through function pointers (assuming the target is in the same translation unit), they just choose not to, for whatever reason. — Oliver Charlesworth, Apr 09 '12 at 21:24
@Oli Only if they can prove that the function pointer is never changed, which is probably most often not easily doable at compile time (and is almost impossible for qsort). But sure they could - JVMs are doing basically the same thing when inlining virtual valls, they just have some advantages there. — Voo, Apr 09 '12 at 22:02
@Oli: at least gcc and clang *do* inline through function pointers - you just need to take some care how you factor the code — Christoph, Apr 09 '12 at 23:07

Evan Teran · Accepted Answer · 2012-04-10T00:50:08.157

As @sehe mentioned in a comment. It's about the abstractions more than anything else. In other words, if the language allows the coder to express intent better, then it can emit code which implements that intent in a more optimal fashion.

A simple example is std::fill. Sure for basic types, you could use memset, but, let's say it's an array of 32-bit unsigned longs. std::fill knows that the array size is a multiple of 32-bits. And depending on the compiler, it might even be able to make the assumption that the array is properly aligned on a 32-bit boundary as well.

All of this combined may allow the compiler to emit code which sets the value 32-bit at a time, with no run-time checks to make sure that it is valid to do so. If we are lucky, the compiler will recognize this and replace it with a particularly efficient architecture specific version of the code.

(in reality gcc and probably the other mainstream compilers do in fact do this for just about anything that could be considered equivalent to a memset already, including std::fill).

often, memset is implemented in a way that has run-time checks for these types of things in order to choose the optimal code path. While this difference is probably negligible, the idea is that we have better expressed the intent of "filling" an array with a specific value, so the compiler is able to make slightly better choices.

Other more complicated language features do a good job of using the expression of intent to get larger gains, but this is the simplest example.

To be clear, my point is not that std::fill is "better" than memset, instead this is an example of how c++ allows better expression of intent to the compiler, allowing it to have more information during compile time, resulting in some optimizations being easier to implement.

The same can be accomplished with an inline definition of `memset` in `string.h`... — R.. GitHub STOP HELPING ICE, Apr 09 '12 at 21:57
Are there any compilers that actually *do* optimize `std::fill` better than memset? — Crashworks, Apr 09 '12 at 22:27
@R..: You are of course correct, I was not implying that this cannot be done in C. My intent was to show that it is **easier** to give the compiler expressive information about types and intent. Which in turn makes it easier for the compiler to emit good code. `std::fill` is illustrative of that point, not of something that "c++ can do, but c can't". — Evan Teran, Apr 10 '12 at 00:44

score 3 · Answer 2 · answered Apr 09 '12 at 22:16

It depends a bit on what you think of as the optimization here. If you're thinking of it purely as "std::sort vs. qsort", then there are thousands of other similar optimizations. Using a C++ template can supports inlining in situations where essentially the only reasonable alternative in C is to use a pointer to a function and nearly no known compiler will inline the code being called. Depending on your viewpoint, this is either a single optimization, or an entire (open-ended) family of them.

Another possibility is using template meta-programming to turn something into a compile-time constant that would normally have to be computed at run-time with C. In theory, you could usually do this by embedding a magic number. This is possible via a #define into C, but can lose context, flexibility or both (e.g., in C++ you can define a constant at compile time, carry out an arbitrary calculation from that input, and produce a compile-time constant used by the rest of the code. Given the much more limited calculations you can carry out in a #define, that's not possible nearly as often.

Yet another possibility is function overloading and template specialization. These are separate, but give the same basic result: using code that's specialized to a particular type. In C, to keep the number of functions you deal with halfway reasonable, you frequently end up writing code that (for example) converts all integers to a long, then does math on that. Templates, template specialization, and overloading make it relatively easy to use code that keeps the smaller types their native sizes, which can give a substantial speed increase (especially when it can enable vectorizing the math).

One last obvious possibility stems from simply providing quite a few pre-built data structures and algorithms, and allowing such things to be packaged for relatively easy, efficient re-use. I doubt I could even count the number of times I wrote code in C using what I knew were relatively inefficient data structures and/or algorithms, simply because it wasn't worth the time to find (or adapt) a more efficient one to the task at hand. Yes, if it really became a major bottleneck, I'd go to the trouble of finding or writing something better -- but doing a bit of comparing, it's still fairly common to see speed double when written in C++.

I should add, however, that all of these are undoubtedly possible with C, at least in theory. If you approach this from a viewpoint of something like language complexity theory and theoretical models of computation (e.g., Turing machines) there's no question that C and C++ are equivalent. With enough work writing specialized versions of each function, you can/could theoretically do all of those same things with C as you can with C++.

From a viewpoint of what code you can plan on really writing in a practical project, the story changes very quickly -- the limit on what you can do mostly comes down to what you can reasonably manage, not anything like the theoretical model of computation represented by the language. Levels of optimization that are almost entirely theoretical in C are not only practical, but quite routine in C++.

*nearly no known compiler will inline the code being called* is misleading -- gcc does it, clang does it, and I'd really be surprised if other high-performance compilers couldn't do it; of course such optimizations are only possible if the code resides in the same translation unit, but that's equally true for templates (in fact, that's where most of their performance benefit comes from) — Christoph, Apr 09 '12 at 22:59
@Christoph: True in theory. In reality, however, `qsort` (or `bsearch`, for example) won't have their implementation in a header where it'll be seen in the same TU as the comparison function. By contrast, (as you've noted) `std::sort`, `std::lower_bound`, etc., *will* always be in a header so the code is directly visible in the target TU -- and almost anything else you do with templates *will* do the same. C, by contrast, just isn't written this way. Other than on an extremely localized basis, it's only theoretical. — Jerry Coffin, Apr 09 '12 at 23:11
the STL is superior to the libc from a batteries-included point of view, but that's a separate issue; while it's true that the GNU libc doesn't provide an inline definition for `qsort()` and `bsearch()`, that's arguably a library defect as C99 inline semantics explicitly support the use case of multiple definitions so you can both have a generic version in a shared library and create an optimized one from the inline definition provided by a header; it's actually a reasonably elegant solution which avoids the code duplication inherent in template usage... — Christoph, Apr 09 '12 at 23:34
@Christoph: Yup, as I said before, in theory this is absolutely true. The reality is, even though C added `inline` 13+ years ago, to do it, you'd still have to write your own standard library. — Jerry Coffin, Apr 09 '12 at 23:44
@Christoph with the advent of LTO, residing in the same TU is pretty much removed as a requirement for optimisation. (library code is still 'firewalled' obviously) — underscore_d, Apr 13 '16 at 07:49

score -1 · Answer 3 · answered Apr 09 '12 at 21:56

-1

Even the qsort vs std::sort example is invalid. If a C implementation wanted, it could put an inline version of qsort in stdlib.h, and any decent C compiler could handle inlining the comparison function. The reason this usually isn't done is that it's massively bloated and of dubious performance benefit -- issues C++ folks tend not to care about...

answered Apr 09 '12 at 21:56

R.. GitHub STOP HELPING ICE

208,859
35
376
711

1

So how would I achieve the same performance with qsort as with std::sort with my own function/types (and strings, longs, etc - because my compiler doesn't do whatever optimization you think of)? And the claim that std::sort offers only "dubious performance benefits" over standard qsort is trivial to refute - as was actually shown in the linked benchmark example (except if about 10times slower falls under "dubious improvement") – Voo Apr 09 '12 at 22:06
1

@Voo, what R.. is saying that this is not a principal thing of C versus C++ but that the provider of the C library chooses not to provide an inlinable version of `qsort`. If you want to do such a thing yourself, it should be relatively straight forward to extract the code of some public domain implementation of `qsort` and provide that as an inline version. – Jens Gustedt Apr 09 '12 at 22:26
@Jens I assume I misunderstand you, but do you mean we should write a `qsortString`, `qsortStringIgnoreCases`, etc. functions? Because that's not really useful. Or is there a way to get the c compiler to inline the actual calls to the function pointer? (I just don't see how that'd be done?) – Voo Apr 09 '12 at 22:34
1

@Voo, if the function definition is visible at the call point and the function pointer is properly `const`ed, there is no principal reason that the call can't be inlined. In particular the compiler will not lose the type information that would be otherwise lost by casting to `void*`. I remember having seen this, I think with gcc. – Jens Gustedt Apr 09 '12 at 22:42
2

Your statement _"massively bloated and of dubious performance benefit -- issues C++ folks tend not to care about..."_ is a large and inaccurate generalization. There are many cases where C++ developers worry about footprint and performance, particularly in the embedded space where C++ is used a great deal. – Void - Othman Apr 09 '12 at 22:46
aside from having to re-parse the function definition each time `stdlib.h` is included, there *is* no bloat in providing an additional inline definition of `qsort()`; I consider the fact that there is no such definition a defect with the GNU libc, which is probably due to the different semantics of GNU-C and ISO-C where `inline` is concerned – Christoph Apr 09 '12 at 23:44
@Christoph: I suspect a large portion of users of `qsort` would rather their programs not grow by tens of kb per call to `qsort`, and not lose the ability for performance to improve without recompiling when libc.so is upgraded, than get the marginal performance benefits of inlining... – R.. GitHub STOP HELPING ICE Apr 10 '12 at 00:51
@R.. The compiler is not required to inline a function just because it appears in a header. – David Stone Jan 06 '14 at 14:36
@DavidStone: And a C++ compiler isn't required to refrain from inserting extraneous 1000000000-iteration NOP loops between every pair of instructions output... – R.. GitHub STOP HELPING ICE Jan 06 '14 at 17:12
minus for the blatantly false partisan trollbait about C++, and for the same reason I'd downvote the latest comment if I could. – underscore_d Apr 13 '16 at 07:52

Example of compiler optimizations that can be 'easily' done on C++ code but not C code

3 Answers3