1

As a follow-up to "How to use bitmask": When I want to test if a certain flag is set in a given bitmask, I would usually do so solely through the usage of a bitwise AND:

if(bitmask & flag) { …

Yet I frequently see something like this:

if((bitmask & flag) == flag) { …

This is something I observed on both, strongly and weakly typed languages, which is ruling out some sort of prevention for type-casting disasters. The only scenario I can come up with in which the two tests were not equivalent were if flag happens to have actually more than one bit set and all of those are required to be set in bitmask for the condition to pass. Is that all or am I missing something here?

Bonus: Do compilers have means to recognize a flag that will have at most one bit set during runtime and optimize the (possibly bogus) comparison away?

DaSourcerer
  • 6,288
  • 5
  • 32
  • 55
  • Your latter reason (i.e. multiple mask bits) is definitely a key here. And for consistency, you may well want to use the same pattern even if you only have one mask bit. IMO, another reason is to be more explicit about intent (for example, I *always* write `if (foo != 0)` rather than `if (foo)`). – Oliver Charlesworth Jul 06 '17 at 11:48
  • 1
    Some languages requires a boolean type in a conditional, so simply `if(bitmask & flag)` isn't legal, as it is in e.g. C and C++. – nos Jul 06 '17 at 11:50
  • @OliverCharlesworth If I wanted consistency, I would introduce a macro or a helper function. Would also help with the intent-part, I think ;) – DaSourcerer Jul 06 '17 at 11:55
  • @DaSourcerer - Yup, I have no disagreements there :) – Oliver Charlesworth Jul 06 '17 at 11:56
  • @nos I must admit, that hasn't crossed my mind. Most languages I use accept everything non-zero as true. Thanks for the hint! – DaSourcerer Jul 06 '17 at 11:57

1 Answers1

3

As you point out, the two conditions aren't equivalent. The check (bitmask & flag)1 is checking if any bit set in flag is set in bitmask, while the (bitmask & flag) == flag check checks if all bits in flag are set.

In cases where there will only ever be one bit set, you might also see the second form rather than the first in languages like Java that don't have an implicit conversion from integral types to boolean. The exact equivalent of the first check in those languages is (bitmask & flag) != 0 and that form is probably also more canonical even for checking a single flag.

In terms of actual generated code it's mostly all academic.

For known-at-compile-time2 flags where the compiler can see that the flag only has one bit, both forms will generally be compiled to identical code - see the fixedA and fixedB functions in that link.

For functions with unknown flags, the forms aren't equivalent, as mentioned, so the code is different (see the unknownA and B functions) but the performance difference is very minor.


1 Or, equivalently, (bitmask & flag) != 0 in languages that don't have an implicit conversion from integer types to boolean.

2 Importantly, this includes functions that accept variable flags but where the caller is using a constant flag and the function is inlined.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • Outstanding answer, thanks a lot! Wrt generated code being 'mostly all academic': This may hold true for code that is not in the hotpath. In other cases the ability to shave off an opcode can be invaluable. – DaSourcerer Jul 06 '17 at 23:43
  • @das - right, but as the generated code shows, even in that case they are largely equivalent - clang generated one less instruction for the first case, but gcc was the opposite: one less instruction for the second case. Beyond that knowing which is faster depends on a deep context sensitive analysis (context being the surrounding code) and may depend of the micro-architecture. After all that you are talking about a difference of between 0 and 1 cycles usually. That will usually not even be measurable. – BeeOnRope Jul 06 '17 at 23:54
  • I do realize performance gains cannot be expected to be exceptional. This is also the type of micro-optimisation I would usually avoid as code legibility tends to suffer. Yet I would argue that at the right spots improvements were measurable and noticeable. Good point on the micro-architecture, btw. – DaSourcerer Jul 07 '17 at 00:33
  • Do you have an example where such an optimization mattered? I'm usually on the other side of the argument, by the way, arguing that small changes can have a mesurable impact, but not here. First the change is effectively a wash and second, in a real world scenario where flag checking somehow made up a measurable portion of the runtime you'd almost always have algorithm transformation opportunities that would dominate worrying about the form of the check (and compilers might do this for you, eg hoisting the check out of a loop). – BeeOnRope Jul 07 '17 at 00:37
  • It *can* make a difference when crunching enormous amounts of data. Could also come in handy when reducing time spent in callbacks and event loops. Admittedly, performing bitmask manipulations in there is a bit constructed: This were mostly about saving an opcode, not necessarily for this purpose. I agree that this could be achieved (and often even better so) through compiler optimisations. That is not a given in some cases, though. Especially interpreted languages cannot or do not afford the luxury of costly optimisation passes. – DaSourcerer Jul 07 '17 at 01:21
  • gcc's sequence for `fixedA` is interesting, but disappointingly high latency! (5c instead of 2 on Skylake, 6c instead of 3 on Haswell. Also, `sbb eax,eax` has a false dep on `eax` on CPUs other than AMD Bulldozer-family). Even `-march=skylake` doesn't get it to change to `cmov`. The `mov r32,imm32` from the obvious way doing it (which clang uses) are off the critical path. Hmm, maybe gcc's sequence was designed to save registers. That seems to be common when gcc does something that's otherwise less-optimal, like `cmp/setcc/movzx` instead of `xor/cmp/setcc`. – Peter Cordes Jul 15 '17 at 17:31
  • 1
    Oh interesting. gcc and clang uses a single [BMI1 `andn`](http://felixcloutier.com/x86/ANDN.html) for `(bitmask & flag) == flag` when both are runtime-variables, instead of `and` / `cmp`. I'd forgotten that BMI1 instructions even set flags at all, since BMI2 is all no-flags instructions like `mulx` and `shrx`. – Peter Cordes Jul 15 '17 at 18:08
  • Yeah `gcc` really loves its various `sbb` tricks (and also the `sar r32, 31` and `sar r64, 63` variants) for conditional moves rather than `cmov`. I think a lot of those transformations are peephole/local rules applied at some late pass and perhaps haven't been updated. `clang` overall seems happier to use `cmov`. – BeeOnRope Jul 15 '17 at 20:24