How do I find how C++ compiler implements something except inspecting emitted machine code?

Question

Suppose I crafted a set of classes to abstract something and now I worry whether my C++ compiler will be able to peel off those wrappings and emit really clean, concise and fast code. How do I find out what the compiler decided to do?

The only way I know is to inspect the disassembly. This works well for simple code, but there're two drawbacks - the compiler might do it different when it compiles the same code again and also machine code analysis is not trivial, so it takes effort.

How else can I find how the compiler decided to implement what I coded in C++?

I want to know the answer to this too, but I really can't see how the answer can be anything other than "Hope the compiler docs mention it somewhere". — j_random_hacker, Dec 02 '10 at 06:28
+1. however, i don't see how one can know. besides disassembly, or knowing the compiler source (and simulating it in one's head). — lijie, Dec 02 '10 at 06:33

score 8 · Accepted Answer · answered Dec 02 '10 at 06:28

I'm afraid you're out of luck on this one. You're trying to find out "what the compiler did". What the compiler did is to produce machine code. The disassembly is simply a more readable form of the machine code, but it can't add information that isn't there. You can't figure out how a meat grinder works by looking at a hamburger.

score 4 · Answer 2 · answered Dec 02 '10 at 07:44

4

I was actually wondering about that.

I have been quite interested, for the last few months, in the Clang project.

One of Clang particular interests, wrt optimization, is that you can emit the optimized LLVM IR code instead of machine code. The IR is a high-level assembly language, with the notion of structure and type.

Most of the optimizations passes in the Clang compiler suite are indeed performed on the IR (the last round is of course architecture specific and performed by the backend depending on the available operations), this means that you could actually see, right in the IR, if the object creation (as in your linked question) was optimized out or not.

I know it is still assembly (though of higher level), but it does seem more readable to me:

far less opcodes
typed objects / pointers
no "register" things or "magic" knowledge required

Would that suit you :) ?

answered Dec 02 '10 at 07:44

Matthieu M.

287,565
48
449
722

So, this IR high level assembly language... it sounds a lot like "C" from your description! Certainly sounds interesting... got a link to some examples ? – timday Dec 02 '10 at 13:52
@timday: the reference is there http://llvm.org/docs/LangRef.html a quick tutorial can be found here http://llvm.org/releases/2.6/docs/tutorial/JITTutorial1.html It is a high level assembly and not C-like at all (notably because it's expressed in SSA form). – Matthieu M. Dec 02 '10 at 14:03
Interesting. Any idea how to get clang to output IR? – dhardy Aug 17 '11 at 15:09
@dhardy: If you have a tiny example, you can always use the online the LLVM Try Out page :) Otherwise, there is the `-emit-llvm` flag. I think the result is a binary file, but if it is it can be converted back to text using `llvm-dis`. – Matthieu M. Aug 17 '11 at 16:59

score 3 · Answer 3 · answered Dec 02 '10 at 06:51

Timing the code will directly measure its speed and can avoid looking at the disassembly entirely. This will detect when compiler, code modifications or subtle configuration changes have affected the performance (either for better or worse). In that way it's better than the disassembly which is only an indirect measure.

Things like code size can also serve as possible indicators of problems. At the very least they suggest that something has changed. It can also point out unexpected code bloat when the compiler should have boiled down a bunch of templates (or whatever) into a concise series of instructions.

Of course, looking at the disassembly is an excellent technique for developing the code and helping decide if the compiler is doing a sufficiently good translation. You can see if you're getting your money's worth, as it were.

In other words, measure what you expect and then dive in if you think the compiler is "cheating" you.

More generally, first answer the question "why do you care" then check to see if what you care about is good enough. If it is move on, if not; profile to find what is taking up the most resources. Often, the answers to all three will surprise you. — BCS, Dec 09 '10 at 03:22

Tony Delroy · Answer 4 · 2010-12-10T07:12:01.387

You might find a compiler that had an option to dump a post-optimisation AST/representation - how readable it would be is another matter. If you're using GCC, there's a chance it wouldn't be too hard, and that someone might have already done it - GCCXML does something vaguely similar. Of little use if the compiler you want to build your production code on can't do it.

After that, some compiler (e.g. gcc with -S) can output assembly language, which might be usefully clearer than reading a disassembly: for example, some compilers alternate high-level source as comments then corresponding assembly.

As for the drawbacks you mentioned:

the compiler might do it different when it compiles the same code again

absolutely, only the compiler docs and/or source code can tell you the chance of that, though you can put some performance checks into nightly test runs so you'll get alerted if performance suddenly changes

and also machine code analysis is not trivial, so it takes effort.

Which raises the question: what would be better. I can image some process where you run the compiler over your code and it records when variables are cached in registers at points of use, which function calls are inlined, even the maximum number of CPU cycles an instruction might take (where knowable at compile time) etc. and produces some record thereof, then a source viewer/editor that colour codes and annotates the source correspondingly. Is that the kind of thing you have in mind? Would it be useful? Perhaps some more than others - e.g. black-and-white info on register usage ignores the utility of the various levels of CPU cache (and utilisation at run-time); the compiler probably doesn't even try to model that anyway. Knowing where inlining was really being done would give me a warm fuzzy feeling. But, profiling seems more promising and useful generally. I fear the benefits are more intuitively real than actually, and compiler writers are better off pursuing C++0x features, run-time instrumentation, introspection, or writing D "on the side" ;-).

score 2 · Answer 5 · answered Dec 02 '10 at 13:48

You want to know if the compiler produced "clean, concise and fast code".

"Clean" has little meaning here. Clean code is code which promotes readability and maintainability -- by human beings. Thus, this property relates to what the programmer sees, i.e. the source code. There is no notion of cleanliness for binary code produced by a compiler that will be looked at by the CPU only. If you wrote a nice set of classes to abstract your problem, then your code is as clean as it can get.

"Concise code" has two meanings. For source code, this is about saving the scarce programmer eye and brain resources, but, as I pointed out above, this does not apply to compiler output, since there is no human involved at that point. The other meaning is about code which is compact, thus having lower storage cost. This can have an impact on execution speed, because RAM is slow, and thus you really want the innermost loops of your code to fit in the CPU level 1 cache. The size of the functions produced by the compiler can be obtained with some developer tools; on systems which use GNU binutils, you can use the size command to get the total code and data sizes in an object file (a compiled .o), and objdump to get more information. In particular, objdump -x will give the size of each individual function.

"Fast" is something to be measured. If you want to know whether your code is fast or not, then benchmark it. If the code turns out to be too slow for your problem at hand (this does not happen often) and you have some compelling theoretical reason to believe that the hardware could do much better (e.g. because you estimated the number of involved operations, delved into the CPU manuals, and mastered all the memory bandwidth and cache issues), then (and only then) is it time to have a look at what the compiler did with your code. Barring these conditions, cleanliness of source code is a much more important issue.

All that being said, it can help quite a lot if you have a priori notions of what a compiler can do. This requires some training. I suggest that you have a look at the classic dragon book; but otherwise you will have to spend some time compiling some example code and looking at the assembly output. C++ is not the easiest language for that, you may want to begin with plain C. Ideally, once you know enough to be able to write your own compiler, then you know what a compiler can do, and you can guess what it will do on a given code.

score 1 · Answer 6 · answered Dec 02 '10 at 06:54

The answer to your question was pretty much nailed by Karl. If you want to see what the compiler did, you have to start going through the assembly code it produced--elbow grease is required. As to discovering the "why" behind the "how" of how it implemented your code...every compiler (and every build, potentially), as you mentioned, is different. There are different approaches, different optimizations, etc. However, I wouldn't worry about whether it's emitting clean, concise machine code--cleanliness and concision should be left to the source code. Speed, on the other hand, is pretty much the programmer's responsibility (profiling ftw). More interesting concerns are correctness, maintainability, readability, etc. If you want to see if it made a specific optimization, the compiler docs might help (if they're available for your compiler). You can also just trying searching to see if the compiler implements a known technique for optimizing whatever. If those approaches fail, though, you're right back to reading assembly code. Keep in mind that the code that you're checking out might have little to no impact on performance or executable size--grab some hard data before diving into any of this stuff.

So now not only the developers should emit readable code, the compilers should too? You can get that if you want if you use -O0, almost a 1:1 mapping everywhere. I think the OP meant be "clean" no unnecessary register spills, no calls to constructors which don't do anything, removing a wrapper function by calling the inner function directly etc. — Gunther Piez, Mar 09 '11 at 09:33
@drhirsch Nice necro comment :) But seriously, I understood what he was getting at, I didn't mean that the compiler had to emit clean (human clean) code. I was talking about it, however, because the OP wanted to understand what was going on in the compiler by looking at the disassembly. — Gemini14, Mar 12 '11 at 00:49

score 1 · Answer 7 · answered Dec 02 '10 at 07:59

Actually, there is a way to get what you want, if you can get your compiler to produce DWARF debugging information. There will be a DWARF description for each out-of-line function and within that description there will (hopefully) be entries for each inlined function. It's not trivial to read DWARF, and sometimes compilers don't produce complete or accurate DWARF, but it can be a useful source of information about what the compiler actually did, that's not tied to any one compiler or CPU. Once you have a DWARF reading library there are all sorts of useful tools you can build around it.

Don't expect to use it with Visual C++ as that uses a different debugging format. (But you might be able to do similar queries through the debug helper library that comes with it.)

score 1 · Answer 8 · answered Mar 09 '11 at 02:52

If your compiler manages to translate your "wrappings and emit really clean, concise and fast code" the effort to follow-up the emitted code should be reasonable.

Contrary to another answer I feel that emitted assembly code may well be "clean" if it is (relatively) easily mappable to the original source code, if it doesn't consist of calls all over the place and that the system of jumps is not too complex. With code scheduling and re-ordering an optimized machine code which is also readable is, alas, a thing of the past.

How do I find how C++ compiler implements something except inspecting emitted machine code?

8 Answers8

Linked