When working with two dimensional arrays with a width that is a power of two, there are multiple ways to calculate the index for a (x, y) pair:
given
size_t width, height, stride_shift;
auto array = std::vector<T>(width*height);
size_t x, y;
assert(width == ((size_t)1 << stride_shift));
assert(x < width)
assert(y < height);
1st way - Addition
size_t index = x + (y << stride_shift);
2nd way - Bitwise OR
size_t index = x | (y << stride_shift);
3rd way - Bitwise XOR
size_t index = x ^ (y << stride_shift);
It is then used like this
auto& ref = array[index];
...
All yield the same result, because a+b
, a|b
and a^b
result in the same value, as long as not both bits are set at the same bit position.
It works because the index is just a concatenation of the bits of x and y:
lowest bits
v v
index = 0b...yyyyyyyyyxxxxxxxxx
\___ ___/
'
stride_shift bits
So the question is, what is potentially faster in x86(-64)? What is recommended for readable code?
- Could there be a speedup by using
+
, because the compiler might use both theADD
andLEA
instruction, such that multiple additions can be performed at the same time on both the address calculation unit and the ALU. - Could OR and XOR be potentially faster? Because the calculation of every new bit is independent of all others, that means no bit is dependent on the carry of lower bits.
- Consider a nested for loop, y is iterated on the outside. Are there compilers that do an optimization in loops where
row_pointer = array.data()+(y << stride_shift)
is calculated for each iteration ofy
. And thenpointer = row_pointer+x
. I think this optimization wouldn't be possible using|
or^
. - Mixing bitwise operations and arithmetic operations is sometimes considered bad code, so should one choose
|
to go with the<<
?
As a bonus: How does this apply in general/to other architectures?