1

I wrote this basic code for a DSP/audio application I'm making:

double input = 0.0;
for (int i = 0; i < nChannels; i++) {
      input = inputs[i];

and some DSP engineering expert tell to me: "you should not declare it outside the loop, otherwise it create a dependency and the compiler can't deal with it as efficiently as possible."

He's talking about var input I think. Why this? Isn't better decleare once and overwrite it?

Maybe somethings to do with different memory location used? i.e. register instead of stack?

SpamBot
  • 1,438
  • 11
  • 28
markzzz
  • 47,390
  • 120
  • 299
  • 507
  • 5
    In this particular code sample `double input = inputs[nChannels -1];` would have been even better. – StoryTeller - Unslander Monica Aug 08 '16 at 11:04
  • 7
    The world is full of people who think they know how compilers worked twenty years ago, and they still work the same today. – Sam Varshavchik Aug 08 '16 at 11:05
  • 1
    I don't know what the DSP expert had in mind, but the idea is to keep variable initialization as close as possible to the place in code where they are used. And it's even better if you can keep it to the smallest possible scope. – StoryTeller - Unslander Monica Aug 08 '16 at 11:07
  • 1
    You meant to write `input += inputs[i];` ? – ZivS Aug 08 '16 at 11:08
  • 1
    @StoryTeller I think the missing closing brace is a hint of some loop body that isn't relevant here. – Quentin Aug 08 '16 at 11:09
  • 1
    @Quentin, better be explicit than hint. I hoped my little jest would have made that clear. – StoryTeller - Unslander Monica Aug 08 '16 at 11:10
  • 1
    @StoryTeller Well, I fell for it -- my bad :p – Quentin Aug 08 '16 at 11:12
  • 2
    Efficiency may be an issue if you're using an old (>20 years) compiler for a machine with very few registers. If none of those is true it's unlikely to cause any inefficencies. But you should declare the variable in as narrow a scope as possible anyway in order to protect from bugs. ("Declare once and modify" may make sense if an object is expensive to create and destroy but cheap to modify. Creating and destroying a primitive can usually be considered to have no cost at all.) – molbdnilo Aug 08 '16 at 11:26

3 Answers3

6

Good old K&R C compilers in the early eighties used to produce code as near as possible what the programmer wrote, and programmers used to do their best to produce optimized source code. Modern optimizing compilers can rework things provided the resulting code has same observable effects as the original code. So here, assuming the input variable is not used outside the loop, an optimizing compiler could optimize out the line double input = 0.0; because there are no observable effects until next assignation : input = inputs[i];. And it could the same factor the variable assignation outside the loop (whether in source C++ file it is inside or not) for the same reason.

Short story, unless you want to produce code for one specific compiler with one specific parameters set, and in that case you should thoroughly examine the generated assembly code, you should never worry for those low level optimizations. Some people say compiler is smarter than you, other say compiler will produce its own code whatever way I wrote mine.

What matters is just readability and variable scoping. Here input is functionaly local to the loop, so it should be declared inside the loop. Full stop. Any other optimization consideration is just useless, unless you do have special requirements for low level optimization (profiling showing that these lines require special processing).

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
3

It is better to declare variable inside the loop, but the reason is wrong.

There is a rule of thumb: declare variables in the smallest scope possible. Your code is more readable and less error prone this way.

As for performance question, it doesn't matter at all for any modern compiler where exactly you declare your variables. For example, clang eliminates variable entirely at -O1 from its own IR: https://godbolt.org/g/yjs4dA

One corner case, however: if you ever takes an address of input, variable can't be eliminated (easily), and you should declare it inside the loop, if you care about performance.

  • "and you should declare it inside the loop" why that is better? – markzzz Aug 08 '16 at 12:37
  • @paizza, look at the example [here](https://godbolt.org/g/dZZMTH). In `foo_bad`, under `.lr.ph` label (loop body) there are two `load`'s, and in `foo_good`, there is only one. Taking address of the variable introduces tricky dependency and compiler can't eliminate the variable (and `load`/`store`) anymore. –  Aug 08 '16 at 12:46
  • 1
    It's not just taking the address that's the issue here; it's letting that address escape what the optimizer can see that defeats optimizations. Also, your good vs. bad functions compile to equivalent x86-64 asm (removing the `-emit-llvm` option from the `-O2` godbolt URL in your comment. I didn't look at the LLVM IR.) Since `g` could modify `inputs[i]`, the compiler has to spill/reload `input_internal` across the call to `g`. You get more [efficient code from passing `inputs[i]` as the arg for both functions](https://godbolt.org/g/Y0sSB3). – Peter Cordes Aug 08 '16 at 19:43
  • 1
    If you were targeting the Windows ABI (which has some call-preserved xmm regs), using a local to tell the compiler to access the input array only once would lead to two reg-reg moves in the loop, instead of a spill/reload. (And save/restoring it in the function prologue/epilogue) – Peter Cordes Aug 08 '16 at 19:44
  • @Peter Cordes, for ARM and PPC, `foo_good` is also load+2 reg-reg moves, and `foo_really_good` is two loads. That makes me think not designating some xmm registers as caller-saved was an overlook. That again proves the good old "do not optimize unless you are sure what are you doing". Thanks for observation! –  Aug 09 '16 at 00:25
  • Cache-hit loads are cheap. In a lot of cases, two loads would be equal or better to one load + 2 reg-reg moves (esp. on a low power ARM core where total number of instructions and code size is a bigger deal). – Peter Cordes Aug 09 '16 at 00:29
  • I agree that having two or four xmm regs be call-preserved would probably be a win for the x86-64 SysV ABI. The Windows ABI definitely goes too far, and only has 5 arg-passing / scratch XMM regs, IIRC. But note that only the low 128 bits of the reg are call-preserved in Win64: the upper part of the ymm / zmm reg is not preserved. The lack of an extensible / future-proof save/restore method for vector regs is probably part of the reason for the SysV ABI choosing to make them all call-clobbered. (Other than XSAVE/XRSTOR, of course, which is only usable for saving them *all*.) – Peter Cordes Aug 09 '16 at 00:33
  • The history of the ABI design is still visible on the amd64 mailing list archives. I dug up some interesting links [in a recent answer](http://stackoverflow.com/a/35619528/224132). Jan Hubicka tested various ABI ideas by compiling SPECint and SPECfp, and looking at code size and instruction count (IDK if it was static or dynamic (running on a simulator). Couldn't benchmark because no AMD64 silicon was release yet.) So maybe SPECfp didn't have many loops with FP and functions that couldn't inline (or at least be visible to do inter-procedural register-allocation optimizations). – Peter Cordes Aug 09 '16 at 00:38
3

Many people think that declaring a variable allocates some memory for you to use. It does not work like that. It does not allocate a register either.

It only creates for you a name (and an associated type) that you can use to link consumers of values with their producers.

On a 50 year old compiler (or one written by students in their 3rd year Compiler Construction course), that may be implemented by indeed allocating some memory for the variable on the stack, and using that every time the variable is referenced. It's simple, it works, and it's horribly inefficient. A good step up is putting local variables in registers when possible, but that uses registers inefficiently and it's not where we're at currently (have been for some time).

Linking consumers with producers creates a data flow graph. In most modern compilers, it's the edges in that graph that receive registers. This is completely removed from any variables as you declared them. They no longer exist. You can see this in action if you use -emit-llvm in clang.

So variables aren't real, they're just labels. Use them as you want.

harold
  • 61,398
  • 6
  • 86
  • 164
  • I think declaring variables in the smallest possible scope is considered good style, though. It improves human-readability and can defend against bugs when code is modified later. But yes, it's only a style issue for modern compilers with optimization enabled. I've heard that vendor-supplied compilers for some embedded processors are not very good, and one vendor provides a non-optimizing compiler for free, but you have to pay for a version that lets you enable optimization. So people with a DSP background might be used to bad compilers? – Peter Cordes Aug 08 '16 at 19:27