Bitwise equality

Question

I need to perform a bitwise equality between two bytes. That means that for instance if I have two bytes: 00011011 and 00011110 the result is 11111010 The only fast way I see is to use the following statement

byte a, b;//set input bytes
byte c = ~(a^b);//output bytes

But I wonder if there is a faster solution for this. After these equality operation I want to mask the bits I need. So I need to use an AND-operation. So the code becomes:

byte a, b;//set input bytes
byte m;//mask, intresting bits are set to 1, others to 0
byte c = (~(a^b))&m;//output bytes

aren't there any faster and more simple methods that don't need to use all those bitwise operations because this part of the code will be called very often.

I work in Mono/C#.Net, but the syntax of any language in the C-family is equivalent I think, it's not that important. — Willem Van Onsem, Feb 04 '10 at 15:32
Wait. You want something "faster" than two bitwise operations, two operations which directly implement the semantics you describe? Huh? — Jonathan Feinberg, Feb 04 '10 at 15:32
Why do you think your language's operator == isn't good enough? — Ivan Krechetov, Feb 04 '10 at 15:33
Do you really think a couple bit-wise ops will be your bottleneck? — ChaosPandion, Feb 04 '10 at 15:33
What makes you think these methods are slow or complicated? Typically, CPUs have instructions that do logical operations, so they're as fast as addition or subtraction. In addition, you're using standard operators, so I don't see any unnecessary complication. I'd expect any method other than what you're using to be slower, more complicated, and harder to understand. — David Thornley, Feb 04 '10 at 15:34
Bitwise operations like this are typically very, very fast - what evidence do you have that you need to optimise them? — , Feb 04 '10 at 15:34
I bet there's a loop wrapped around this code, or a function call. That's going to cost a lot more. ALUs have bitwise operations for lunch and dessert, especially at full register width. — Mike Dunlavey, Feb 04 '10 at 17:13
I don't know what platform your code is running on, but on most architectures this will get converted into three machine instructions (XOR, NOT, AND). Possibly two if you have XNOR available. This expression is straightforward enough that you can probably hand-code it in inline assembly if performance is that crucial. — bta, Feb 04 '10 at 17:39

score 6 · Accepted Answer · answered Feb 04 '10 at 15:36

6

I doubt it can be done in fewer operations. That looks optimal. Perhaps you could store ~(a^b) in a lookup table (256*256 entries)? I doubt you would get much benefit and possibly even make things worse, but you could try it.

answered Feb 04 '10 at 15:36

Mark Byers

811,555
193
1,581
1,452

Did you benchmark that? Theoretically ~(a^b) could be faster it probably executes in two cycles and pipelines well, an index lookup to memory can be expected to take several cycles. – Feb 04 '10 at 15:42
5

The OP should note that the operations the compiler will perform to index the table are likely to be at least as expensive (probably more) as the operations to calculate the result directly. And that doesn't take into account the additional memory access. – Michael Burr Feb 04 '10 at 15:49
3

Cache-misses are 2 magnitudes slower than arithmetic ops. – Viktor Klang Feb 04 '10 at 15:53
When the lookup table and actual input byte array fit in the same cache line, this operation should be pretty quick. However, I expect a ~(a^b) to be as fast (and it's perhaps even optimized by the JIT compiler). – Steven Feb 04 '10 at 15:55

score 4 · Answer 2 · answered Feb 04 '10 at 16:28

You are looking in the wrong place for this optimization; you won't end up finding any better bitwise operation here. Even if you did, it's hardly going to speed anything up. The real win will come from processing more than just a byte at a time. The processor is already having to do a bunch of bit shifting and masking operations just so that it can pretend for you that you are working with bytes. Process your arrays of bytes 1 word at a time, or use vector instructions if they are available.

score 2 · Answer 3 · answered Feb 04 '10 at 15:43

These operations seem fast enough to be honest. I think you shouldn't try to optimize them further, but finish your software first, see if you are happy with the overall performance and use a profiler if you are not. I am fairly sure the problem will be elsewhere.

score 2 · Answer 4 · answered Feb 04 '10 at 15:44

2

What you want is an XNOR operation. Unfortunately this is not supported by C#/Mono. I think your solution is optimal.

answered Feb 04 '10 at 15:44

Michael Krauklis

3,914
2
28
28

2

I wouldn't be surprised if the MS JIT compiler is able to optimize a ~(a^b) to an XNOR. CommuSoft should look at the generated assembly code to see if that's the case. – Steven Feb 04 '10 at 15:53
That's a good thought. I would be curious to see if that was optimized as well. – Michael Krauklis Feb 04 '10 at 16:13
2

I opened the Debug/Windows/Disassembly window in VS2008 to see what assembly gets generated and I see an XOR statement and a NOT statement. This would mean it is not optimized. However, we're talking about just 2 assembly instructions: this is blazingly fast. – Steven Feb 04 '10 at 16:27
That's too bad. Thanks for looking into it. – Michael Krauklis Feb 04 '10 at 16:50

Bitwise equality

4 Answers4

Linked