Find out (in C++) if binary number is prefix of another

Question

I need a function with a header like this:

bool is_prefix(int a, int b, int* c) {
    // ...
}

If a is, read as a binary number string, a prefix of b, then set *c to be the rest of b (i.e. "what b has more than a") and return true. Otherwise, return false. Assume that binary strings always start with "1".

Of course - it is easy to do by comparing bit by bit (leftshift b until b==a). But is there a solution which is more efficient, without iterating over the bits?

Example: a=100 (4), b=1001 (9). Now set *c to 1.

Can you give a numeric example? Should the overlap be on the high or the low end of b? — Pixelchemist, Jun 18 '13 at 16:07
Does the first bit of the prefix have to begin with `1`? Because unless the bits in `a` are in reverse order to those in `b`, I don't see how you'd be able to limit the bits in `a` to just the prefix as `a` and `b` are going to be the same size as integer values. — JAB, Jun 18 '13 at 16:08
Shift the low-order bits of b off to the right until a and b are equal. — Hot Licks, Jun 18 '13 at 16:08
@HotLicks that's basically comparing bit by bit, except it makes `a` not being a prefix a worst-case scenario as `b` would have to be shifted and compared `sizeof(int)*8` times before you could be sure `a` wasn't the prefix. — JAB, Jun 18 '13 at 16:08
@JAB - No, it's comparing a max of N times for an N-bit-wide value. — Hot Licks, Jun 18 '13 at 16:10
@JAB - You can, of course, "short-circuit" things a bit by testing for zero leading bytes in the arguments and by giving up when b has become smaller than a. But the basic algorithm is sound, and reasonably fast. — Hot Licks, Jun 18 '13 at 16:16
@n.m. Basically, get log2 of both numbers, then position readers on the first bits of both, then move both readers forward bit by bit. — Johannes, Jun 18 '13 at 16:16
Note that taking the log2 or some such will be slower than iterating over bits in most cases. — Hot Licks, Jun 18 '13 at 16:18
@HotLicks A more consistent case would be shifting up such that the first bits of both `a` and `b` are 1 (could be determined using a basic numeric calculation if I'm not mistaken, but then again maybe I am, but that in itself could possibly be obviated by using a reference array/etc.), then checking if `a` is able to mask the number of upper bits of `b` equal to `sizeof(int) * 8 - ashift`), and if so take that masked value of `b` and shift it down back to normal and store in `c`. — JAB, Jun 18 '13 at 16:20
That's a hard solution. An easy solution would be to find such a number N that a>>N==b. Iff such N exists, b is a prefix of a. You can use binary search if linear search over bits-per-your-integral-type is too slow. — n. m. could be an AI, Jun 18 '13 at 16:23
@JAB - There are a half-dozen different strategies for short-circuiting, but they all risk being more expensive than a straight-forward solution. — Hot Licks, Jun 18 '13 at 16:26
It may or may not give you a performance increase (my bet is on "not" but that's just a guess). — n. m. could be an AI, Jun 18 '13 at 16:32
Looping over a maximum of 64 (or 32) bits is very unlikely to be any kind of performance issue. You should aim for portability and readability instead. — oseiskar, Jun 18 '13 at 16:52
@oseiskar Depends on how often the function will be used, really. If it's called repeatedly, using a basic optimization to reduce the calculation time is probably not a bad idea (assuming the compiler wouldn't be doing it in the first place, which could be a possibility). Going by http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup, it seems using a lookup table would be more efficient than down-shifting `b` when `a` is small and `b` is large, though Johannes' example utilizes a small `b`. — JAB, Jun 18 '13 at 17:09
(And a case with `b` and `a` both large but being only a few bits different in length would similarly have potential to be quicker than the lookup table.) — JAB, Jun 18 '13 at 17:10

score 3 · Accepted Answer · edited May 23 '17 at 11:56

3

You can use your favorite "fast" method to find the highest set bit. Let's call the function msb().

bool is_prefix (int a, int b, int *c) {
    if (a == 0 || b == 0 || c == 0) return false;
    int d = msb(b) - msb(a);
    if (d < 0) return false;
    if ((b >> d) == a) {
        *c = b ^ (a << d);
        return true;
    }
    return false;
}

Shift b so its high order bit aligns with a, and compare that with a. If they are equal, then a is a "prefix" of b.

This algorithm's performance depends on the performance of msb(). If it is constant, then this algorithm is constant. If msb() is expensive, then the "easy approach" may be the fastest approach.

edited May 23 '17 at 11:56

Community

1
1

answered Jun 18 '13 at 16:07

jxh

69,070
8
110
193

Did you assume that the first bits of `a` and `b` are at the same position, so if a is shorter, it is right-padded with zeros? – Johannes Jun 18 '13 at 16:19
Infinite loop if `a` = 0. Edit: fixed now – kotlomoy Jun 18 '13 at 16:47
@kotlomoy: It depends on `msb()`, but I'll disallow `0` input. – jxh Jun 18 '13 at 16:51
If `msb(a) > msb(b)` then `d` is negative and the expression `b >> d` results in undefined behavior. – Andrew Tomazos Jun 19 '13 at 08:57
@jxh: Reducing to a common problem is always a good idea. Thanks! – Johannes Jun 20 '13 at 13:04
@jxh: Why do we check for `b!=0`? The case `a=b=0` should be legal, and the algorithm returns false for it. – Johannes Jun 24 '13 at 14:32
@Johannes: Adjust the function as you see fit. I just wanted to avoid calling `msb()` on input that might not have a sensible return value. – jxh Jun 24 '13 at 16:08

score 0 · Answer 2 · answered Jun 18 '13 at 16:30

0

I'm not too sure, but would something like the following work:

bool
is_prefix( unsigned a, unsigned b, unsigned* c )
{
    unsigned mask = -1;
    while ( mask != 0 && a != (b & mask) ) {
        a <<= 1;
        mask <<= 1;
    }
    c = b & ~mask;
    return mask != 0;
}

(Just off the top of my head, so there could be errors.)

answered Jun 18 '13 at 16:30

James Kanze

150,581
18
184
329

1

This looks like bit-by-bit? (similar to my "easy approach") – Johannes Jun 18 '13 at 16:37

Find out (in C++) if binary number is prefix of another

2 Answers2