Efficient execution of boolean expression on bitmap in C or C++

Question

What is the most efficient way of executing a boolean expression on a bitmap in C or C++? For example, let's say that I have a 4-bit bitmap (a, b, c, d). Now, let's say that I have a simple boolean expression like (a AND b) OR (c AND d). How should I represent the boolean expression so that I could efficiently apply it to my bitmap? I am looking for a generic solution that can be applied to any boolean expression, not just the one given as example. In other words, I am looking for some way to "compile" the boolean expression into another data structure that could be used to efficiently reduce my bitmap into a boolean.

The bitmap structure is the result of filtering operations on the records of a database. Every record has its own bitmap, and every bit in a bitmap is the result of an individual filtering rule. The boolean expression is used to combine these filtering rules to decide whether the record should be included in the results of a database query. There can be up to 64 individual filtering rules that can be combined by the boolean operation, hence the bitmap can be represented as an unsigned long long int if necessary.

The solution should be efficient in terms of speed and should not alter the bitmap structure. The conversion of the boolean expression into another structure does not have to be memory-efficient nor fast, because it can be cached (at least in my current use case). The reducing of the bitmap with the transformed boolean expression should be both fast and memory efficient.

Notes:

The boolean expression is only using nested AND and OR operations (no IF statements).
The solution should assume the availability of a 64bit CPU.
The solution should not be CPU-dependent (beside 64bit addressing).
The solution should not assume the availability of any other particular hardware (eg GPU).
All the bitmaps are in memory.
There can be a very large number of bitmaps (billions).
Bitmaps are updated one at a time.

Efficient in terms of speed first and foremost, but also memory-efficient vis-à-vis the bitmap. The bitmap cannot be transformed into another structure. But the boolean expression could be transformed into something else, and this transformation does not have to be memory efficient, because it could be cached. — Ismael Ghalimi, Oct 04 '14 at 15:42
The bitmap structure is the result of filtering operations on the records of a database. Every record has its own bitmap, and every bit in a bitmap is the result of an individual filtering rule. The boolean expression is used to combine these filtering rules to decide whether the record should be included in the results of a query. — Ismael Ghalimi, Oct 04 '14 at 15:48
That's a very general definition. Can we use an `unsigned int` for that? — Galik, Oct 04 '14 at 15:49
We can have up to 64 filtering rules, so we need the `long long`. — Ismael Ghalimi, Oct 04 '14 at 16:15
I haven't had much time to apply to this yet but if you want an efficient way to execute hierarchical expressions then it may be worth "compiling" them into RPN or "postfix notation" using a stack. — Galik, Oct 04 '14 at 20:17

Thomas Matthews · Answer 1 · 2014-10-04T16:42:02.027

The most efficient method of using AND or OR operations on bitmaps is to use hardware assistance. Many graphics processors can perform operations on two bitmaps. There is no C++ standard library operation for this.

You need to perform the operation on each bit, byte, word or double word in the bitmaps.

The next speed efficient method is to unroll the loop. Branch instructions waste execution cycles (that could be used for data instructions) and may clear out the instruction pipeline wasting time reloading it.

You can also gain some efficiency by effectively using the processor's data cache. Load up a bunch of variables, perform the operation, store the result, repeat.

You should also fetch in groups using the processor's word size. A 32-bit processor loves to fetch 32-bits at a time. So this would give you 8 sets of 4-bit pixels that are loaded with one fetch. Otherwise, you would have to fetch 8 bits at a time, which results in 4 fetches of 8 bits as compared to 1 fetch of 32-bits.

Here's the core algorithm:

uint8_t * p_bitmap_a = &Bitmap_A[0];
uint8_t * p_bitmap_b = &Bitmap_B[0];
uint8_t * p_bitmap_c = &Bitmap_C[0];

// C = A AND B

for (unsigned int i = 0; i < bitmap_size / 4; ++i)
{
  uint32_t  a = *((uint32_t*) p_bitmap_a);
  uinte2_t  b = *((uint32_t*) p_bitmap_b);
  uint32_t  c = a & b;
  *((uint32_t *) p_bitmap_c) = c;
  p_bitmap_a += sizeof(uint32_t);
  p_bitmap_b += sizeof(uint32_t);
  p_bitmap_c += sizeof(uint32_t);
}

Edit 1:
Your processor may have instructions that can assist with the operations. For example, the ARM7 processor can load many registers from memory with one instruction. Research your processors instruction set. You may have to use inline assembly language to take advantage of processor specific instructions.

Edit 2: Threading & Parallel processing.

Unless your bitmaps are huge, the overhead of maintaining multiple threads of execution or parallel execution may outweigh the benefit. For example, if the overhead of synchronizing with another CPU core is 200ms and processing of the bitmap without interruption is 1000ms, you've wasted time by using parallel processing on the single bitmap (1200 ms to have another core process the bitmap).

If you have many bitmaps, you may gain some time by using parallel processing or multiple threads:

One thread fetches bitmaps from the database into memory (buffers).
Another thread processes the bitmaps and stores into an outgoing buffer.
A third process writes the buffered bitmaps to the database.

If you are fetching bitmaps from an external source, such as a database, this I/O will be your bottleneck. This is the part you should optimize or spool.

Thanks for the detailed answer. Still digesting it. In the meantime, I've added some notes that further describe my requirements. — Ismael Ghalimi, Oct 04 '14 at 17:05

dgnuff · Answer 2 · 2014-10-04T19:05:55.150

If the bitmaps are GUARANTEED to always be 4 bits, then they will fit in the lower 4 bits of a char, and there will only be 16 possible values for any bitmap.

For a particular boolean expression, you then evaluate it for each of the sixteen possible bit combinations, which gives you a set of sixteen result bits. Assemble them into a sixteen bit int: false, false, false, false in bit zero, false, false, false, true in bit 1, and so on.

Now for an arbitrary bitmap versus an arbitrary boolean, your check becomes:

Treat the bitmap as a 4 bit int, evaluate 1 << (4 bit int).
Take the result of that shift and use the C++ & operator to test against the boolean operation's cached 16 bit int value.

That'll return == 0 for false and != 0 for true.

Reducing it to two instructions: a shift and an and is about the fastest I can see doing it.

This assumes that there are a fairly small number of boolean operation that you are applying over an over, the setup per boolean test will be expensive, but since you're talking billions of bitmaps, I'm assuming that you'll be using the same boolean operation on many many bitmaps.

This solution would work for small bitmaps, but we need to scale up to 64bit bitmaps, in which case the solution would not work anymore. But I think you're solution is practical up to 16bit bitmaps. Thanks! — Ismael Ghalimi, Oct 04 '14 at 20:00
You'd have to investigate the viability of this, but it might be possible to subdivide a 64 bit bitmap into either 4 16 bit sections, or 8 8 bit sections, and then use a shift / and pair for each section. Also in the case of a 64 bit bitmap, how many bits from it are actually used in the boolean expression? When creating the lookup table for the boolean expression, you only need `2^n` terms, where `n` is the number of bits used in the expression. So even with a 64 bit bitmap, if there's only 8 terms in the boolean expression, the lookup table works out to 256 64 bit values. — dgnuff, Oct 04 '14 at 21:09

Adrian May · Answer 3 · 2014-10-07T09:00:01.373

You can represent the expression as a binary tree and might as well use two classes for the two types of node. You could also parameterise each node with the operation but it's hardly worth while. Perhaps you also make a Not node with one input. Inputs to nodes are either places in your bitmap or other nodes, so I'm making a subclass for the former case which takes the index in the bitmap as a parameter. You finish this code by writing the value function for the And node and completing the Or node.

typedef unsigned long long Bitmap;
Bitmap bitmap;

struct Node {
  virtual bool value()=0;
};

struct AbsNode : public Node {
  int bit;
  bool value() {return (bitmap>>bit)&1; }
}

struct AndNode : public Node {
  Node *operandA, *operandB;
  etc.
}

Efficient execution of boolean expression on bitmap in C or C++

3 Answers3