How can I prove or disprove the efficiency of compilation?

Question

This is an unusual question, but I do hope there's a definitive answer.

There's a longstanding debate in our office about how efficiently compilers generate code, specifically number of instructions. We write code for low power embedded systems with virtually no loops. Therefore, the number of instructions emitted is directly proportional to power consumed.

Much of our code looks like this (notice, no dynamic memory allocation, no system calls, very few function calls, very few loops).

foo += 3 * (77 + bar);
if (baz > 18 - qux)
    bar -= 19 + 7 >> spam;

I can compile the above snippet with -O3 and read the assembly, but I couldn't write it myself.

The claim I would like to prove or disprove is that compilers generate code that is 2-4X "fatter" (and therefore consume 2-4X times as much power) compared with hand written assembly code.

I'm interested in any compiler with which you have experience.

From this answer I know that GCC and clang can emit assembly interleaved with the C code with

gcc -g -c -Wa,-alh foo.cc

These answers provide solid basis:

When is assembly faster?

Why do you program in assembly?

How can I measure the efficiency with which a compiler generates code?

I think you would have to prove that "number of instructions" == "power consumed". What does your processor do when it is "not doing anything"? — Floris, Sep 11 '13 at 00:42
Simple answer: benchmarking. You can argue the merits of speed or size all day long. But at the end of the day there is something satisfying in seeing an actual result. Build your exe optimized for speed or for size, then test it on a machine. Build it again, using the methods being debated, then test it against the first method, and so on. You will have more definitive results than you will by simple debate. — ryyker, Sep 11 '13 at 00:43
Also, it would be better to compare gcc with `-Os` not `-O3` if you are really concerned about code size: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html — Drew MacInnis, Sep 11 '13 at 00:45
@ryker I don't think that speaks to the OP's question really, As he states that he could not write the assembler code by hand - and that's the method he would need to benchmark against. — John Faulkner, Sep 11 '13 at 00:46
@JohnFaulkner Exactly. One could write atrocious assembler and say "look how much better the copmiler is" but that's the worst kind of strawman argument :) — Cuadue, Sep 11 '13 at 00:48
@Floris that's a good point. Directly proportional is probably an overstatement, but the work does run at about 1% duty cycle on a processor optimized for standby power consumption. The other 99% it's in hardware low-power mode. — Cuadue, Sep 11 '13 at 00:50
This is the kind of thing that folks at programmers stack exchange enjoy. — old_timer, Sep 11 '13 at 01:13
Especially on modern processors with non-orthogonal instruction sets, pipelining concerns, etc, a decent compiler and do a much better job of instruction choice and scheduling than can a mortal human. There are exceptions when a very small piece of code must be exceptionally "tight", but for normal programming a good compiler, with decent source input, can easily outperform a good programmer. This has been true since 1985 or so. (I've been programming since 1968, so I've seen both sides of this.) — Hot Licks, Sep 11 '13 at 01:27

score 3 · Answer 1 · answered Sep 11 '13 at 01:41

Hand assembly can always at least match if not beat the compiler, because at the very least, you can start with the compiler generated assembly code and tweak it to make it better. To really do a good job, you need to understand the CPU architecture (pipeline, functional units, memory hierarchy, out-of-order dispatch units, etc.) so that you can schedule each instruction for maximum efficiency.

Another thing to consider is that the number of instructions is not necessarily directly proportional to performance, whether it is speed or power (see Hennessey and Patterson's Computer Architecture: A Quantitative Approach). Basically, you have to look at how many clock cycles each instruction takes, in addition to the number of instructions (and clock rate) to know how long it will take. To know how much energy will be consumed, you also need to know how much energy each instruction takes.

How the CPU implements each instruction affects how many cycles it takes to execute. As an example, your code sequence has a >> operator. The compiler might translate that to a single ASR instruction, but without knowing the architecture, there is no telling how many clock cycles it might take -- some architectures can do an arbitrary shift in a single cycle, while others need one cycle for each bit shift.

Memory access contributes to the number of cycles and power consumption, too. When there are too many variables to store in registers some of them will have to be stored in memory. If you are accessing off chip memory and have a fairly high CPU clock rate, the memory bus can be pretty power hungry. A longer sequence of instructions that avoids reading from and writing to memory (e.g., by computing the same result twice) can be less expensive.

As several others have suggested, there is no substitute for benchmarking. Assuming you are using a microcontroller-based system with a constant input voltage, your best bet is to measure the current draw of your system with each alternative set of code and see which does best (one way would be with a current probe and a digital storage oscilloscope).

Even if you can always write better assembler than the compiler, there is a cost in development time and maintainability. In The Mythical Man Month Brooks estimated 3-5x more effort at time when many, if not most, programmers wrote code in assembler. Unless your code is really tiny, you are probably best off only coding the most critical parts in assembly. Even so, the person writing the assembly should be able to prove that their (more expensive) code is worth the cost by comparing running code vs. running code.

score 2 · Answer 2 · answered Sep 11 '13 at 01:06

If the question is "how can I measure the efficiency with which a compiler generates code" (your actual question), the answer is "that depends". It depends on how you define "efficiency". Mostly, compilers are designed to optimize for speed. As you change the optimization level (-O1, -O2, -O3), the compiler will spend more time looking for "clever things to do to make it just a bit faster". This can involve loop unrolling, order of execution, use of registers, and many other things.

It seems that your "efficiency" criterion is not one that compilers are designed for: you say you want "fewest cycles" because you think that == lowest power. However I would argue that "fastest execution" == "shortest time before processor can go into standby mode again". Unless you believe that the power consumption of the processor in "awake" mode changes significantly with instructions executed, I think that it is safe to say that fastest execution == shortest time awake == lowest power consumption.

In which case "fat code" doesn't matter - it's back to speed only. Note also that not all instructions take the same number of clock cycles (although to be fair, that depends on the processor).

old_timer · Answer 3 · 2013-09-11T03:32:00.990

EDIT, okay that was fun...

Folks that make the blanket statement that compilers outperform humans, are the ones that have not actually checked. Anything a compiler can create a human can create. But a compiler cannot always create the code a human can create. It is that simple. For projects anywhere from a few lines to a few dozen lines or larger, it becomes easier and easier to hand fix the optimizations made by a compiler. Compiler and target help close that gap but there will always be the educated someone that will be able to meet or exceed the compilers output.

The claim I would like to prove or disprove is that compilers generate code that is 2-4X "fatter" (and therefore consume 2-4X times as much power) compared with hand written assembly code.

Unless you are defining "fatter" to mean uses that much power. Size of a binary and power consumption are not related. If this whole question/project is related to power consumption, the compiler wont take into account the bios settings you have chosen (assuming you are talking about pcs), the video card, hard disk, monitor, mouse, keyboard, etc, etc. In addition to the processor which is only one (relatively small) part of the equation. And even if it did would someone make a compiler that only makes your code efficient, they cant and wont tune the compiler for every system on the planet. Aint gonna happen.

If you are talking a mobile phone which is a very controlled environment the app may get tuned to save power, but the compiler is not the master of that, it is the user, the compiler does part of it the rest is hand tuned by the programmer.

I can compile the above snippet with -O3 and read the assembly, but I couldn't write it myself.

If you go into this with that kind of attitude then you have automatically failed. Yes you can meet or beat the compiler, period. It is a matter of confidence and will power and time/effort. That statement means you have not really studied the problem, which is why you are asking the question you are asking. Take some time, do some more research, ask detailed questions at stackoverflow (not open ended ones like this one), and with time you will understand what compilers do and dont do and why, in particular why they are not perfect (for any one or many rulers by which that opinion is defined). This question is wholly about opinion and will spark flame wars, and such and will get closed and eventually removed from this site. Instead write and compile and publish code segments and ask questions about "why did the compiler produce this output, why didnt it to [this] instead?" Those kinds of questions have a better chance at real answers and of staying here for others to learn from.

I agree with most of what you say - until you get to "...and eventually eradicate the human race". I think you are getting carried away a little bit there. Just cut the power to the darn things and show them who is boss. Time for a cabin in Montana. — Floris, Sep 11 '13 at 01:10
I would disagree. While you can certainly find isolated cases where a human can do better, on the whole a good optimizing compiler will *vastly* outperform a human assembly programmer. And be *much* more reliable to boot. I'd say it's quite "definitive". (And I've written tens of thousands of lines of assembly code, and likely a million lines of HLL code.) — Hot Licks, Sep 11 '13 at 01:31
I am happy for you...I have written equivalent amounts of code and examined millions of lines of compiled output. Many compilers free and expensive, for many targets. The right human can meet or exceed the compiler (there are thousands of us out there), the average human wont. And good or bad it is generally a good idea to let the compiler do the work and only fix it where you really, really, need to. — old_timer, Sep 11 '13 at 03:37
After all your edits you answer definitely gets a vote from me. Even though, as you say, this question - being mostly "opinion based" - will probably get closed/removed, so you will lose the rep again. Not that you are so rep-starved that you would notice. Fun discussion though. — Floris, Sep 11 '13 at 12:26
rep is not important to me, not because of the level I have reached but because that is not what this site is really about IMO. It is about education. paying it forward. Pay back your mentors by mentoring someone else...You had my +1 right away. — old_timer, Sep 11 '13 at 13:28

How can I prove or disprove the efficiency of compilation?

3 Answers3