Efficient way to get all matching bitmasks?

Question

Lets say I have some patterns as 64 bits integers. Given a 64 integer n how to list all patterns matching n efficiently?

More precisely I am interested in this set: {p_i in patterns such as (n & p_i) == p_i}

Thanks for your help

a 64 bits integer. i.e: pi = 1 + 1 << 10 + 1 << 20 + 1 << 50 here it is a pattern where positions 0, 10, 20 and 50 are set — Abdessamad El Kasimi, Jan 01 '21 at 22:35
I think we need more info about the distribution of patterns, and the possible values of `n`. When `n` has 64 bits sets, it matches every pattern in the list. — user3386109, Jan 01 '21 at 22:36
You can assume that the patterns contains between 4 and 20 positions set — Abdessamad El Kasimi, Jan 01 '21 at 22:40
How many of these queries do you plan to do? I guess optimizing this makes only sense if the number of queries on the same set of patterns is greater than the number of patterns in that set. — Socowi, Jan 02 '21 at 03:09
What is problem in checking `(n & p) == p` for every pattern `p`? How much time does it taking for the set of 160k `p`? — H.S., Jan 02 '21 at 09:54
You can say that I want to query for that many times Q ~ 10M — Abdessamad El Kasimi, Jan 02 '21 at 10:43
Could post some actual code template? Would a sample function look like `std::vector fine_them(std::vector integers, uint64_t n)`? Could you post sample "patterns" and a sample `n` and give results? What did you try to solve the problem? — KamilCuk, Jan 02 '21 at 14:17

sonus21 · Answer 1 · 2021-01-03T15:20:38.860

One optimized solution not of O(log(P)) complexity, but better than O(P)

LSB => Least Significant Bit

Map each pattern to a set as

m[i] = { p for p in patterns if p's ith LSB is zero }

O(P) complexity

positions = Find all non-set bit positions of N i.e with zero value

O(1) complexity

Your answer would be intersection of { patterns, m[i] for i in positions}

The intersection of the k sorted set can be computed in O(N) where N is the largest set cardinality.

This will not work when all bits are set, more specifically when your input is 2^64-1. In this case all bits are set so you can't find any position with unset bit, in that case print all patterns.

Example:

p = {1,2,3,4,5,6,7 } <=> {b001, b010, b011, b100, b101, b110, b111}

pattern_map =

{
   1: [010, 100, 110],
   2: [001, 100, 101],
   3: [001, 010, 011],
   4: [0001, 0010, 0011, 0100, 0101, 0110, 0111]
}

n = 1 <=> b001

2nd and 3rd LSB are zero

=> intersection of [001, 010, 011, 100, 101, 110, 111], [001, 100, 101] and [001, 010, 011]

=> [001]
n = 2 <=> b010

1st and 3rd LSB are zero

=>intersection of [001, 010, 011, 100, 101, 110, 111], [010, 100, 110] and [001, 010, 011]

=> [010]
n = 3 <=> b011

3rd LSB is zero

=> intersection of [001, 010, 011, 100, 101, 110, 111] and [001, 010, 011]

=> [001, 010, 011]
n = 4 <=> b100

1st and 2nd LSBs are zero

=> intersection of [001, 010, 011, 100, 101, 110, 111], [010, 100, 110] and [001, 100, 101]

=> [100]
n = 5 <=> b101

2nd LSB is zero

=> intersection of [001, 010, 011, 100, 101, 110, 111] and [001, 100, 101]

=> [001, 100, 101]
n = 6 <=> b110

1st LSB is zero

=>intersection of [001, 010, 011, 100, 101, 110, 111] and [010, 100, 110]

=>[010, 100, 110]
n = 7 => b111

4th LSB is zero

=> [0001, 0010, 0011, 0100, 0101, 0110, 0111]

Splendid idea! However, the formulations are confusing. For instance *`Intersection with an empty set is considered itself`* sounds like `s ∩ Ø = s`. I think you mean the n-ary intersection `⋂_{s ∊ Ø} s = universe`, that is the set off all elements you could ever encounter in any `s` (here `universe = P` the set of patterns). Also, I don't understand what you mean by *`This will not work when all bits are set`*. Do you mean all bits in the query `n` or in any pattern `p1`? I'd really love to see this answer getting polished :) — Socowi, Jan 02 '21 at 15:10
This can be much slower than the plain brute-force solution for certain distributions of patterns. Consider as an example 32-bit patterns, where the 16 least significant bits are all 0. While matching any number (with trailing zeros, again) with those patterns, this solution finds the intersection of the set of all patterns with itself, 15 times. So instead of comparing the input with 2^16 patterns, we do 15 * 2^16 comparisons between patterns. In fact, OP's problem doesn't have a solution that is "better than O(P)", since we need to output Ω(P) patterns in the worst case. — Cem, Jan 02 '21 at 22:08
You can drastically speed up the intersection part using dynamic programming. When intersecting two sets, cache the result and reuse it as an intermediate result for further intersections. To maximize reuse sort the queries by their popcount. Start with those `n`s having the most 1-bits = least 0-bits. — Socowi, Jan 03 '21 at 01:21
@Socowi thanks for the input. You are right, I'm doing n-ary intersection but I'm not quite aware of this term so I have updated my answer to consider pattern as one of the set. Also for the confusing part of all bit set we need to print entire pattern set. — sonus21, Jan 03 '21 at 15:23

Akshay Sehgal · Answer 2 · 2021-01-01T22:56:15.870

0

If I understand the problem correctly (would appreciate more details in question). You could use Numpy broadcasting with np.bitwise_and for your task.

EDIT: My solution is in python, I just realized, you have not mentioned a language tag.

import numpy as np

p = np.random.randint(0,5000,(1000,),dtype='int64') #5000 patterns
n = np.random.randint(0,10000,(500,),dtype='int64') #10000 integers

compare = np.bitwise_and(n[:,None],p[None,:]) == p[None,:] 
#True if (n & pi) == pi, else False

n_idx = 0 #For 0th n integer in my list
p_idx = np.argwhere(compare[0]).flatten() #This is the index of patterns that match
print('integer ->', n[0])
print('matching patterns ->',p[p_idx])

integer -> 9738
matching patterns -> [1538 1544 1026 1536 1536 1034  520]

edited Jan 01 '21 at 22:56

answered Jan 01 '21 at 22:49

Akshay Sehgal

18,741
3
21
51

I am fine with any language but here it looks like the complexity is linear to the count of patterns. i.e: O(P) I am looking for better complexity. O(log(P)) will be the best where P is patterns count. – Abdessamad El Kasimi Jan 02 '21 at 08:29
@harold the patterns are already generated. I need just to traverse them in a smart way depending on `n` to avoid O(P) – Abdessamad El Kasimi Jan 02 '21 at 09:42

Socowi · Answer 3 · 2021-01-02T13:33:42.807

Luckily your check (n & p1) == p1 is transitive. Therefore you can compute that relation on the pattern space too and use it to skip checks. I call this relation coverage and use the symbol ▷ as an abbreviation. p1 covers p2 iff all set bits in p2 are also set in p1. Formally:

p1 covers p2 iff p1 ▷ p2 iff (p1 & p2) == p2.

Transitivity means from n ▷ p1 and p1 ▷ p2 follows n ▷ p2. Therefore we can omit the check n ▷ p2 if we already know n ▷ p1. To use this as often as possible, we have to make sure that p1 is always checked before p2 because p1 ▷ p2. Computing this order is the first step:

P = { ... } // set of patterns

operator ▷(p1, p2):
  return (p1 & p2) == p2

function initCoverGraph():
   // dummy that covers everything
   root = 0xFFFFFFFFFFFFFFFF
   // max. set of root's covers such that no child covers another 
   root.children = {}
   for p in P:
     insertBelow(r, p)

function insertBelow(p1, p2):
   isChild = true
   for c in p1.children:
      if p2 ▷ c1:
         p1.children.remove(c)
         p2.children.add(c)
      if c ▷ p2:
         insertBelow(c, p2)
         isChild = false
         break
   if isChild:
      p1.children.add(p2)

You can optimize this further. From p1 ▷ p2 follows p1 > p2. To leverage this, use sorted sets for P and each .children then split insertBelow's loop in two, iterating only the relevant parts of p1.children.

Now we have built the directed acyclic graph (DAG) of the ▷-relation. For fast queries each node p1 also needs p1.successors, the transitive reflexive children of p1. That is, p1.successors = {p1} u p1.children u { c.children | c in p1.children } u .... You can compute this using a single DFS from the root node. For brevity we just assume we already did.

Now, lets do some queries.

function query(n, root, coveredByN={}):
    if coveredByN.contains(root):
      return
    for c in root.children:
      if n ▷ c:
        coveredByN.addAll(c.successors)
      else:
        query(n, c, coveredByN)

Note that query correctly ignores the dummy root. If P itself contains p1 = 0xFF...FF (equal to root, but not the same) then root.children contains p1, resulting in the correct result.

This should already be quite fast. If needed you can optimize this further by using sorted sets (as mentioned above) and appropriate data structures.

The actual performance heavily depends on the distribution of your patterns. In some cases it could be faster to approach the problem from the other side (from n !▷ p2 and p1 ▷ p2 follows n !▷ p1).

jferard · Answer 4 · 2021-01-03T10:08:03.500

The naive methods is to check all numbers, from 0 to 2^64-1 against the pattern n. You need 2^64 steps (in C++, have a look at vectorization to reduce the number of steps).

There is no method that gives you the same result in 64 steps (see below), but you can improve the naive method.

First idea: let's say your pattern is:

bit     0 0 ...  0 1 ? ? ? ? ? 1 0 ... 0
pos     0 1 ...    i  ...      j  ... 63

Where i is the first 1 and j the last 1. You can find i and j in 64 steps (check against 2^k where 0 <= k < 64).

Obviously, no number greater than or equal to 2^(i+1) will match the pattern, because you have a 1 before the position i. Hence you just have to check 2^(i+1) numbers.

Now, let's consider j: if a number p matches n, then all numbers matching the pattern have the form p + t.2^j, because:

p + k where k < t.2^j has a 1 after the position j, and
p + k where k > t.2^j have the form (euclidean division): (p + q.2^j) + r where r < 2^j. Again, if r != 0, you have a 1 after the position j. You recognized the step. You don't have to check every number, just one out of 2^j numbers.

Conclusion: you can check the numbers against the pattern in 64 + 2^(i-j+1) steps.

Second idea: you can try to generate all numbers, not just check. In theory, it may be be faster, but in practice, I don't think this will be efficient. Let's say the pattern is:

bit     0 ... 0 1   0 ... 1  ...  1   ... 1   0 ... 0
pos             p_1       p_2     p_3 ... p_k

Where the k 1s are at positions p_1, p_2, ..., p_k. 2^k numbers will match the pattern (this is why you can't have the solution in 64 steps): the numbers having 1 or 0 at the positions p_i and 0 elsewhere.

To generate those numbers, iterate over all numbers p between 0 and 2^(k-1). For each p:

Start with N = 0.
Check p against all 2^i, with i between 0 and k-1. If there is a match (a 1 at the position i), add 2^{p_i} to N.
At the end, you have generated a number N that matches the pattern.

This will generate all the numbers matching the pattern in 2^k * k steps.

EDIT Some thoughts

Whatever method you choose, 0 (64 bits) and n (the pattern itself) will match the pattern.
First method: n, the pattern itself, is an upper bound to the matching numbers. Any number greater than n will have at least a 1 on a 0 of the pattern (think of the carry). This upper bound is finer than 2^{i+1}.
First method: you can find i and j by bisection, in a lower number of steps than 64.
It's possible to compute p_1 = i, p_2, ... p_k = j and to choose the most promising method. E.g. don't use the first method for ̀b1000000000000000000000000000000000000000000000000000000000000001.
First method seems slower than second method, but it may take advantage of branch prediction.

Efficient way to get all matching bitmasks?

4 Answers4