Constrained counting sets on binary codes

Question

Problem formulation: Given a binary code of length l, for every t bits set (l%t=0), if there exists at least one bit of value 1, we add 1 to the result.

My question : How to efficiently get the final result?

For example, we have a binary code 010 110 000, and t=3. Then the final result is 2. Since for 000, there is no bit of value 1. For 110, there exists at least one bit of value 1. We add 1 to the result. For 010, there also exists one bit of value 1. We add 1 to the result. Thus, the final result is 2.

My question is that how to efficiently solve this problem without scanning through every t bits, which causes a time complexity linear with the length of the binary code.

For the counting set problem (calculate how many 1's in a binary code), there are some algorithms which take constant time to solve it by taking a limited number of mask and shift operations such as the MIT HAKMEM Count algorithm.

However, existing algorithms for the traditional counting set problem cannot be used to solve my problem.

Does anyone know some tricks for my problems? You can assume a maximum length of the input binary code if it makes the problem easier to solve.

The form of the input does not really matter. The input is just a random binary code. — Rise of Kingdom, May 30 '18 at 04:03
The form matters hugely. A solution for `String` will have nothing in common with a solution for a `byte[]` or an `int` or `long` or `BigInteger` value. — Andreas, May 30 '18 at 04:10
Then you can assume that the input is a binary code of 64-bit integer for this problem. — Rise of Kingdom, May 30 '18 at 04:35

Andreas · Accepted Answer · 2018-05-30T14:35:41.190

From comment:

Then you can assume that the input is a binary code of 64-bit integer for this problem.

Here are two different ways to do it.

The first run in O(I/t) and works by testing each set for all 0-bits.

public static int countSets(int setSize, long bits) {
    Long mask = (1L << setSize) - 1;
    int count = 0;
    for (long b = bits; b != 0; b >>>= setSize)
        if ((b & mask) != 0)
            count++;
    return count;
}

The second run in O(log I) and works by collapsing the bits of a set to the right-most bit of the set, then counting bits set.

public static int countSets(int setSize, int bitCount, long bits) {
    long b = bits, mask = 1;
    for (int i = 1; i < setSize; i++)
        b |= bits >>> i;
    for (int i = setSize; i < bitCount; i <<= 1)
        mask |= mask << i;
    return Long.bitCount(b & mask);
}

Further explanation

The first method builds a mask for the set, e.g. with t = 3, the mask is 111.

It then shifts the value to the right, t bits at a time, e.g. with input = 010 110 000 and t = 3, you get:

mask = 111
b = 010 110 000  ->  b & mask = 000  ->  don't count
b =     010 110  ->  b & mask = 110  ->  count
b =         010  ->  b & mask = 010  ->  count
result: 2

The second method first merge bits to the right-most bit of the sets, e.g. with input = 010 110 000 and t = 3, you get:

bits      = 010 110 000
bits >> 1 = 001 011 000
bits >> 2 = 000 101 100
b (OR'd)  = 011 111 100

It then builds a mask for only checking right-most bits, and then applies mask and counts the bits set:

mask     = 001 001 001
b & mask = 001 001 000
result: 2

Thanks for your reply. I just figured out a solution by myself yesterday which is almost the same to your second solution. The only difference is the time complexity analysis. In my opinion, it should be O(t) if the machine is of 64 bits system or greater, which means that any bitwise operations take constant time. For the first 'for' loop, it takes O(t) time. For the second 'for' loop, we can avoid this loop by simply specifying the mask as an octal 0111. — Rise of Kingdom, May 31 '18 at 03:40
after we apply the equation 'b&mask', we can use existing methods like MIT HAKMEM to count bit sets ,which takes constant time. Thus the total complexity should be O(t). Correct me if I am wrong. — Rise of Kingdom, May 31 '18 at 03:41
@EricHuang You can't specify the mask as an octal 0111. Second version requires mask as `...1000100010001` with distance between 1-bits determined by `t`. --- You talk about MIT HAKMEM to count bits. What do you think `Long.bitCount()` does? See javadoc of [`Long`](https://docs.oracle.com/javase/10/docs/api/java/lang/Long.html#bitCount(long)): *The implementations of the "bit twiddling" methods (such as `highestOneBit` and `numberOfTrailingZeros`) are based on material from Henry S. Warren, Jr.'s **Hacker's Delight**, (Addison Wesley, 2002).* — Andreas, May 31 '18 at 06:37

Constrained counting sets on binary codes

1 Answers1