Are bitmask checks more efficient than comparing numbers?

Question

I've been told that bitmask-based checks are more efficient than comparing numbers. Is this true?

"Bitmask-based" check:

if (val & IMPORTANT_BIT)
...

Number comparison:

if (val == IMPORTANT_VAL)
...

In both cases I feel like it would have to fetch val out of memory anyways. So why would a bitmask be more efficient? Does the compiler do something fancy for bitmasks?

I guess this probably depends on compiler/CPU, though I'd expect in most cases these are quite similar in terms of performance. The bitmask check with `&` is quite common when several flags (bits) are combined in a single integer (byte, short, long) as this allows to easily check if a single bit is set or not. — pjoe, Jan 06 '20 at 15:51
This is most likely a relic of days when processors could more efficiently check if a value is 0 rather than check two values for equality. — dbush, Jan 06 '20 at 15:51
You can [test this yourself](https://godbolt.org/z/wzYMuW). Here, the first approach requires 3 instructions, but the second - only one — ForceBru, Jan 06 '20 at 15:52
@ForceBru that is so cool! Thank you! I had no idea that existed. — Catsunami, Jan 06 '20 at 16:09
@ForceBru But if you actually turn on compiler optimisations they produce similar sequences: https://godbolt.org/z/AlHldn — Gem Taylor, Jan 06 '20 at 16:12
@GemTaylor, and Clang will just emit `mov eax, ` followed by `ret` — ForceBru, Jan 06 '20 at 16:14
Even assuming `IMPORTANT_BIT` and `IMPORTANT_VAL` are the same, those two tests are radically different. If the important bit has the value `1`, the first test checks whether the number in `val` is odd, while the second checks whether it is equal to `1`. Those are so radically different that the comparison is irrelevant. I chose the simplest case, but the concept applies regardless of the actual value of `IMPORTANT_BIT`. — Jonathan Leffler, Jan 06 '20 at 16:15
What are the values of `IMPORTANT_*` and what are the possible values of `val`? If they are not performing the same operation, the comparison is meaningless. Regardless, you are not going to notice any difference in such a trivial operation in the ALU. — Acorn, Jan 06 '20 at 16:17
@JonathanLeffler, there is a series of flags, I just simplified it here because I was curious about the general concept. I started with having multiple definitions such as `#define A 1`, `#define B 2`,`#define C 3`. Then `val` would only either be `A`, `B` or `C`. However, a change was suggested to make it `#define A (1 << 1)`, `#define B (1 << 2)`, `#define C (1 << 3)`. `val` would only be either A, B or C still. — Catsunami, Jan 06 '20 at 16:23
You have to know what might legitimately be stored in `val`. If only one of a set of mutually exclusive values is present, and the `IMPORTANT_BIT` only contains one set bit, then the two are equivalent. In the general case, if `val` has the value `7` and the `IMPORTANT_VAL` (or `IMPORTANT_BIT`) is `5`, then the first test (with bitwise `&`) evaluates to true but the second test (with `==`) evaluates to false. Different results for the same data — probably only one of those results is correct. The important thing to learn here is that the context matters — efficiency is often a red herring. — Jonathan Leffler, Jan 06 '20 at 16:28
@ForceBru, which is why in my example I made the input value a function argument instead of a simplifiable stack var. This is often the easiest way to force the compiler to generate understandable optimised code :-) — Gem Taylor, Jan 06 '20 at 16:32
@JonathanLeffler, I know exactly what is stored in `val` in this case because I wrote that code. The values are guaranteed to be mutually exclusive (hence my initial approach). My question was more about the actual comparison and whether the compiler does something clever when `&` is used. I don't really care if it ends up evaluating to false or true (that will vary at run-time anyway because `val` will change as time goes on).I just want to know if it fetches the value differently and does something interesting. But it seems that it would do exactly what I thought. — Catsunami, Jan 06 '20 at 16:35
Fine — you knew what you were doing. You didn't explain in your question, so how were we to guess what you knew? Without the context, we can't make helpful suggestsions. With the context, there's no practical difference between the equality or the bitwise operation — they're single instructions. — Jonathan Leffler, Jan 06 '20 at 16:36
@JonathanLeffler I didn't "know what I was doing", but it ended up being exactly what I thought. So in the end, I wasn't wrong. Asking questions to confirm or investigate that I possibly don't know something isn't a crime, is it? — Catsunami, Jan 06 '20 at 16:37
@JonathanLeffler. It was a generic question that asked about the difference between `&` and `==`. Didn't realize the values mattered when I wrote it. — Catsunami, Jan 06 '20 at 16:38
@ForceBru : The optimisation to `mov eax, ` is due to the test code having undefined behaviour in the path without an explicit return. The compiler can either return `1` in the case of `square()` or whatever it likes, so the optimal solution is to return `1` regardless and omit the test. Add a `return 0` when the test is `false` and the result differs. Your use of an uninitialised variable may also cause confounding optimisations. https://godbolt.org/z/-CnW_A — Clifford, Jan 06 '20 at 16:55
They do semantically different things, so it is academic and dependent on the machine's instruction set. It is unlikely to be a valuable optimisation, most compilers are more than capable of providing optimisations that will swamp any possible benefit from such a "micro-optimisation". Concentrate on efficient software design and selection of appropriate data structures and let the compiler deal with this sort of machine dependent stuff. — Clifford, Jan 06 '20 at 16:59

Gem Taylor · Answer 1 · 2023-06-10T22:09:55.813

So the place where bitfields are beneficial is where you have a lot of bool flags that you can pack into a single word. The code for testing one particular flag is comparable, but the code for testing for a particular subpattern of flags can be much shorter, though you might need to write the test yourself:

// using bool
bool a,b,c,d;
if (a && !b && c && !d) ...

// hoping the compiler knows what we are doing
enum { A= 0x01, B= 0x02, C= 0x04, D= 0x08};
if ((flags&A) && !(flags&B) && (flags&C) && !(flags&D)) ...

// Optimise it ourselves:
enum { A= 0x01, B= 0x02, C= 0x04, D= 0x08};
if ((flags & (A|B|C|D)) == (A|!B|C|!D)) ...

In the first case, the compiler must load each of the locations, though it could early-out. In the latter cases, at minimum it can load the value once and do multiple operations in core. A good optimiser could optimise the second pattern to the third, which is 2 instructions, though the optimiser might also realise it can load all the bools as a "long" to reduce bandwidth, and they would at least all be in the same cache line, which is almost as good as being in core these days.

But in any case, the 3rd form will win vs. the first form for brevity, as well as saving a little storage.

Note that your two examples do not test the same thing.

A & 5 tests that either the 4-bit or the 1-bit are set, but ignores the 2-bit, and any higher bits completely.

A == 5 does test that the 1-bit and 4-bit are set, but it also checks that ALL OTHER bits are clear.

score 4 · Answer 2 · answered Jan 06 '20 at 17:33

4

First of all, we restrict the question by assuming IMPORTANT_BIT == IMPORTANT_VAL and all other bits of val are 0, so that the outcome of both tests is the same.

Short answer: Don't worry about it.

Long answer:

It depends on the target architecture, the value of IMPORTANT_BIT and the context in which the code is executed.

x86 has the AND instruction which conveniently sets the Z flag, and so can be used to mask-and-test a value against 0 in a single instruction. Then there's the CMP instruction, which compares values. Both AND and CMP instructions have the same latency of 1 and throughput 4. So on x86 there would be no difference, unless the compiler can manage to re-use some specific value and blend with the code above/below. The performance difference would be marginal though.

Having said that, the semantics of & and == tests is really different.

if (val & ... test is used to mask bits.
if (val == ... test is used to compare values.

I suggest to approach it from a code readability perspective first of all. If val is not a bit field, do yourself a favor and don't use & against it.

answered Jan 06 '20 at 17:33

rustyx

80,671
25
200
267

x86 `test` is to `and` what `sub` is to `cmp`. A compiler would use `test` for this to avoid modifying the value unnecessarily, so it could even use a memory operand like `test byte ptr [rdi+2], 0x4` or something to test a bit in the 3rd byte. TEST and CMP can macro-fuse with a following conditional branch (JCC) into a test-and-branch micro-op, on both Intel and AMD. Only Intel Sandybridge-family can macro-fuse `and`/`jcc` or `sub`/`jcc`. ([What is instruction fusion in contemporary x86 processors?](https://stackoverflow.com/q/56413517)) – Peter Cordes Jun 10 '23 at 22:17
There would be a minor difference in asm that used `and` vs. `cmp`, but you're right that there's no advantage to `&` over `==` in C. Unless you go out of your way to give the compiler a hard time, e.g. a 64-bit `IMPORTANT_VAL` that needs a bulky constant like `0x4000000000` that doesn't work as 32-bit immediate for `cmp` so would need a separate `mov r64, imm64`, vs. being able to `bt rax, 38` / `jc` (which can't macro-fuse so is still 2 uops, but larger code size...) – Peter Cordes Jun 10 '23 at 22:22
The advantage to a bitmap is of course being able to test for either of two possible values, like `val & (BIT_FOO | BIT_BAR)` instead of `(val == VAL_FOO) || (val == VAL_BAR)` – Peter Cordes Jun 10 '23 at 22:22
In many cases the compiler could have generated an `xor` instruction for `==` rather than a `cmp` instruction. The fact that nobody bothers is evidence there's nothing to gain. – Joshua Jun 10 '23 at 23:15

Are bitmask checks more efficient than comparing numbers?

2 Answers2