Checking a conditional vs setting a variable multiple times; low-level optimization

Question

If I'm searching through a collection of values and running code for each one, and I want to turn a boolean on when I find a certain quality and then back off again when I've run the code for that object, is it faster to run a conditional to check whether the boolean needs to be turned off or is it faster to simple turn it off on every loop?

For example (psuedo-code):

bool found = false;
for(particle in literallyAHaystack) {
    bool isNeedle = particle == "needle";

    if(isNeedle) {
        found = true;
    }

    // [some code that uses the 'found' variable]

    if(isNeedle) {
        found = false;
    }
}

vs

bool found = false;
for(particle in literallyAHaystack) {
    bool isNeedle = particle == "needle";

    if(isNeedle) {
        found = true;
    }

    // [some code that uses the 'found' variable]

    found = false; // a conditional no longer surrounds this statement
}

I understand this is very low-level and usually-pointless optimization, but I'm still interested in the truth of it. I hope not to offend anyone with the triviality of the question.

So, uh... why do you have both `isNeedle` and `found`? They seem to be redundant. — user2357112, Jul 24 '16 at 03:52
@user2357112: I think the idea is to imagine that `found` might be set by something inside an `if`, so writing the whole thing as nested `if()` clauses without booleans to record the results of previous checks would require code duplication. — Peter Cordes, Jul 25 '16 at 01:01

score 1 · Accepted Answer · edited May 23 '17 at 12:22

If bool found is a local variable, unconditionally setting it to false is almost certainly better in every case, on pretty much every CPU architecture. If it exists at all in the compiler output (rather than just turning into part of the branch logic), it will probably only ever be in a register. Writing registers one of the cheapest possible operations, much cheaper than branching.

Even if it ever hits memory, write-back caches are common, so repeatedly storing to the same location just hits in L1 without generating traffic to the larger shared caches or main memory.

If the compiler emits code that actually stores the flag in memory at the end of every loop, checking the flag next iteration will incur a store-forwarding delay of ~5 cycles (for example on Intel Haswell).

But if a problem, it's the compiler's fault for not optimizing your code well. This is why we use compilers instead of writing in asm directly: the compiler might well optimize away the found variable entirely. That's fine, and is not an argument for restructuring your C.

For more about this kind of stuff on x86, see http://agner.org/optimize/ and other links in the x86 tag wiki.

To see how your code compiles, put it up on http://gcc.godbolt.org/ (and use O3 -march=haswell -ffast-math or something.)

If found is global (and might have been last modified by another thread), it could maybe make sense to only read it first, so the core running your code doesn't need to invalidate other core's copy of the cache line.

I'm imagining a flag that's part of a shared state struct (used by code in a critical section protected by a lock) that different threads might use. (This would be terrible design in this case, though, since found is always left false after use, so there's no persistent state, so it should be local.)

Still, it's probably not worth using a conditional branch just to avoid that store. If the flag is modified at all in any of the loop iterations (i.e. if the needle ever matches), then you might as well modify it as often as you like.

Avoiding stores is mostly only useful if you can go from some stores to the same location to zero, not just to fewer. Caches work.

e.g. when vectorizing, it's sometimes useful to do overlapping stores to a destination, if you want to write 3 elements of a vector instead of all 4, and it's ok to write beyond the end of what you're storing. e.g. in this code.

Thank you so, so much. I know the example code doesn't make complete sense but you answered the question exactly. — c.., Jul 24 '16 at 17:49

score 0 · Answer 2 · answered Jul 24 '16 at 03:26

The second will be "faster" because there's no condition check and thus, no jumping required.

Will it be noticeably faster? Almost definitely not. Regardless, the compiler might make this optimization already, though don't quote me on it – I'm not sure if it does.

Checking a conditional vs setting a variable multiple times; low-level optimization

2 Answers2