Minimizing Memory Use
Naturally a very generalized reason is to cram a lot of data into a small amount of memory. If you consider an array of booleans like this:
bool data[64] = {...};
That can take 64 bytes (512 bits) of memory. Meanwhile the same idea can be represented with bits using 8 bytes (64-bits) of memory:
uint64_t data = ...;
And of course we have a boatload of DRAM these days so it might not seem like it'd matter to compact all this data into the minimum size, but we're still dealing with, say, 64-bit general purpose registers. We're still dealing with 64 byte cache lines, e.g., and kilobytes per physically-mapped page, and moving data down the memory hierarchy is expensive. So if you're processing a boatload of data sequentially, for example, and you can reduce that down to 1/8th its size, often you'll be able to process a whole lot more of it in a shorter amount of time.
So a common use of the analogy above to store a bunch of booleans in a small amount of space is when bit flags are involved, like this:
enum Flags
{
flag_selected = 1 << 0,
flag_hidden = 1 << 1,
flag_removed = 1 << 2,
flag_hovering = 1 << 3,
flag_minimized = 1 << 4,
...
};
uint8_t flags = flag_selected | flag_hovering;
Operating on Multiple Bits at Once
But on top of cramming all this data into a smaller amount of space, you can also do things like test for multiple bits simultaneously:
// Check if the element is hidden or removed.
if (flags & (flag_hidden | flag_removed))
{
...
}
And a smart optimizer will typically reduce that down to a single bitwise and
if flag_hidden
and flag_removed
are literal constants known at compile-time.
As another example, let's go back to the example above:
bool data[64];
Let's say you wanted to test if all 64 booleans are set in which case you do something different. Given this type of representation, we might have to do this:
bool all_set = true;
for (int j=0; j < 64; ++j)
{
if (!data[j])
{
all_set = false;
break;
}
}
if (all_set)
{
// Do something different when all booleans are set.
...
}
And that's pretty expensive when the bitwise representation allows us to do this:
uint64_t data = ...;
if (data == 0xffffffffffffffff)
{
// Do something different when all bits are set.
...
}
This above version can perform the check for all 64 bits set in a single instruction on 64-bit machines. With SIMD registers, you can even test for more than 64 bits at a time with a single SIMD instruction.
As another example let's say you want to count how many of those booleans are set. In that case you might have to do this working with the boolean representation:
int count = 0;
for (int j=0; j < 64; ++j)
count += data[j];
// do something with count
Meanwhile if you used bitwise operations, you can do this:
uint64_t data = ...;
const int count = __popcnt64(data);
// do something with count
And some hardware can do that very efficiently as a native instruction. Others can still do it a whole, whole lot faster than looping through 64 booleans and counting the booleans set to true.
Efficient Arithmetic
Another common one is efficient arithmetic. If you have something like:
x = pow(2, n);
Where n
is a runtime variable, then you can often get a much more efficient result doing:
x = 1 << n;
Of course an optimizing compiler using intrinsics for pow
might be able to translate the former into the latter, but at least C and C++ compilers I've checked as of late cannot perform this optimization, at least when n
is not known at compile-time.
Whenever you're working with power of two, you can often do a lot of things efficiently with bitshifts and other bitwise operations. For example, take this:
x = n % power_of_two;
... where power_of_two
is a runtime variable that is always a power of two. In that case you can do:
x = n & (power_of_two - 1);
Which has the same effect as the modulo (only for power of two numbers). It's because a power of two value will always be a set bit followed by zeros. For example, 16 will be 0b10000
. If you subtract one from that, it becomes: 0b1111
, and using a bitwise and with that will effectively clear all upper bits in a way that gives you the analogical equivalent of n % 16
.
Similar thing with left-shifts to multiply by a power of two, right-shifts to divide by a power of two, etc. One of the main reasons a lot of hardware favored power of two image sizes like 16x16, 32x32, 64x64, 256x256, etc. is due to the efficient arithmetic enabled by it using bitwise instructions.
Conclusion
So anyway, this is a brief introduction to what you can do with bitwise operations and instructions from fast arithmetic, reduced memory use, and being able to perform operations on potentially many bits at once without looping through them and operating on them one bit at a time.
And they're still very relevant today in performance-critical fields. For example, if you look at the Atomontage voxel rendering engine, it claims to be able to rep a voxel in just about a single bit, and that's important not just to fit huge voxel data in DRAM but also to render it really quickly from smaller, fast memory like registers. Naturally it can't do that if it's going to use 8 bits just to store a true/false
kind of value which has to be checked individually.