If bool found
is a local variable, unconditionally setting it to false
is almost certainly better in every case, on pretty much every CPU architecture. If it exists at all in the compiler output (rather than just turning into part of the branch logic), it will probably only ever be in a register. Writing registers one of the cheapest possible operations, much cheaper than branching.
Even if it ever hits memory, write-back caches are common, so repeatedly storing to the same location just hits in L1 without generating traffic to the larger shared caches or main memory.
If the compiler emits code that actually stores the flag in memory at the end of every loop, checking the flag next iteration will incur a store-forwarding delay of ~5 cycles (for example on Intel Haswell).
But if a problem, it's the compiler's fault for not optimizing your code well. This is why we use compilers instead of writing in asm directly: the compiler might well optimize away the found
variable entirely. That's fine, and is not an argument for restructuring your C.
For more about this kind of stuff on x86, see http://agner.org/optimize/ and other links in the x86 tag wiki.
To see how your code compiles, put it up on http://gcc.godbolt.org/ (and use O3 -march=haswell -ffast-math
or something.)
If found
is global (and might have been last modified by another thread), it could maybe make sense to only read it first, so the core running your code doesn't need to invalidate other core's copy of the cache line.
I'm imagining a flag that's part of a shared state struct (used by code in a critical section protected by a lock) that different threads might use. (This would be terrible design in this case, though, since found
is always left false
after use, so there's no persistent state, so it should be local.)
Still, it's probably not worth using a conditional branch just to avoid that store. If the flag is modified at all in any of the loop iterations (i.e. if the needle ever matches), then you might as well modify it as often as you like.
Avoiding stores is mostly only useful if you can go from some stores to the same location to zero, not just to fewer. Caches work.
e.g. when vectorizing, it's sometimes useful to do overlapping stores to a destination, if you want to write 3 elements of a vector instead of all 4, and it's ok to write beyond the end of what you're storing. e.g. in this code.