2

This is a code challenge. I came across this question on SO about testing if a number is included in a given set. The typical solution would be:

const givenSet = [9, 91, 20, 18, 17, 37]
function isIncluded(x) {
  return givenSet.includes(x)
}

However, I'm wondering if a solution using bitmask exists to this problem? Can someone fill in CREATE_BITMASK and JUDGEMENT in below?

const givenSet = [9, 91, 20, 18, 17, 37]
const bitmask = CREATE_BITMASK(givenSet)
function isIncluded(x) {
  return JUDGEMENT(x, bitmask)
}

where givenSet consists of arbitrary positive integer that fits into Int32.

My instant observation is that, if x equals n in the given set, then we can apply XOR to obtain x ^ n === 0. And 0 is a special number that we might be able to leverage. But I don't have further idea.

For your reference, here's the Bitwise Operator Laws


Update: Pointed out by @coderLMN, the above form might be misleading.

Basically I'm seeking a one-pass solution, without the need to iterate through each element in the given set.

What I'm really asking is if it's possible to encode the requirement (the given set) into one single data structure (most likely a bitmask).

A valid solution can take any form, as long as it doesn't need to iterate over the given set for each individual test. Don't limit yourself.

hackape
  • 18,643
  • 2
  • 29
  • 57
  • "Basically I'm seeking a one-pass solution, without the need to iterate through each element in the given set." Sounds like you're asking for the Set object. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set ... – Trentium Apr 16 '21 at 19:40
  • No, this is more like a math problem. Real question being, to what extend can we encode multiple elements (given set of positive integers) into compressed data structure (bitmask) while preserving the necessary information (enough to discriminate if incoming element is included in original set). – hackape Apr 17 '21 at 02:51
  • I think coderLMN is right about it’s impossible to encode them into **one** bitmask. But I guess it’s still possible to compress them into maybe two or three such structs? Still wondering. – hackape Apr 17 '21 at 02:53
  • If your range of integers is 0...n, where n is within reason and you don't have repeated values, then you can make use of a typed array (eg, a Uint32Array) whereby the index of the bit represents the number. Eg, if bit 12 is set, then the number 12 is part of your set. If bit 320 is set (ie, the 0 bit of the 10th element in the Uint32Array), then the number 320 is part of the set. In this way, one can directly determine whether a number is part of a set without having to scan the array. Otherwise, I tend to agree with coderLMN. – Trentium Apr 17 '21 at 11:39

3 Answers3

2

Theoretically, It's impossible.

Suppose you got a set of CREATE_BITMASK, BITWISE_OP and JUDGEMENT that worked for all the integers in givenSet, so you must have:

(9  BITWISE_OP bitmask) === JUDGEMENT
(91 BITWISE_OP bitmask) === JUDGEMENT
(20 BITWISE_OP bitmask) === JUDGEMENT
(18 BITWISE_OP bitmask) === JUDGEMENT
(17 BITWISE_OP bitmask) === JUDGEMENT
(37 BITWISE_OP bitmask) === JUDGEMENT

Given the requirement that the integers fits into Int32, bitmask has to be the same format, which means no matter whatever the BITWISE_OP is, it's not possible to have it operate on different integers to get a same result.

coderLMN
  • 3,076
  • 1
  • 21
  • 26
  • 1
    Thanks for your input. But my description was misleading, doesn't reflect my intention. I've updated the question. Interested to take another try? – hackape Apr 15 '21 at 07:51
  • My view is still the same, because whatever the "one single data structure" (the bitmask) in your revision is, mathematically there is no way to implement a many-to-one ([9, 91, 20, 18...] -> JUDGEMENT) mapping with a "one pass operation" (especially a BITWISE_OP). From the perspective of information theory, the bitmask may be a compressed data structure, which must contain all the information in the list, but the compare operation has to process(iterate) all these information, one way or another. – coderLMN Apr 15 '21 at 08:17
  • Good point. I'll let the question sit for couple of days. If nothing new comes up I'll mark yours accepted. – hackape Apr 15 '21 at 08:19
1

If the range of positive integers is within a reasonable maximum value and repeated values are not a factor, then one can make use of a typed array (eg, a Uint32Array) whereby the index of the bit within this array represents an integer value.

Eg, if bit 12 is set, then the number 12 is part of your set. If bit 320 is set (ie, the 0 bit of the 10th element in the Uint32Array), then the number 320 is part of the set. In this way, one can directly determine whether a number is part of a set by indexing the array rather than scanning the array.

The code below includes a class that, given a maximum integer value, defines a bit array representing a set of integers. Method setBit adds or removes an integer into the set, and method getBit determines whether an integer is in the set of integers.

class BitArray {

  constructor( maxN ) {
    this.max = maxN;
    this.bitArray = new Uint32Array( ( maxN / 32 |0 ) + 1 );
  }
  
  getBit( n ) {
    if ( 0 <= n && n <= this.max ) {
      return 0 < ( this.bitArray[ n / 32 |0 ] & ( 1 << ( n % 32 ) ) );
    }
    return NaN;
  }
  
  setBit( n, val ) {
    if ( 0 <= n && n <= this.max ) {
      if ( val ) {
        this.bitArray[ n / 32 |0] |=  1 << ( n % 32 );
      } else {
        this.bitArray[ n / 32 |0 ] &=  ~( 1 << ( n % 32 ) );
      }
    }    
  }
  
}

let ints = new BitArray( 100 );
[9, 91, 20, 18, 17, 37].forEach( n => ints.setBit( n, true ) );


console.log( `5: ${ ints.getBit( 5 ) }` );
console.log( `37: ${ ints.getBit( 37 ) }` );
console.log( `91: ${ ints.getBit( 91 ) }` );
console.log( `500: ${ ints.getBit( 500 ) }` );

EDIT Sparse Arrays with Large Integers in the Set of Integers

As a follow up to some of the comments regarding memory usage, one solution is to have a two tiered lookup, with the first tier implemented as a sparse array. The first tier is an array that returns the number of integers in a block of integers and a reference to the second tier which is the aforementioned bit map array in the original answer.

By example, let's say the first tier maps integers by blocks of 15 bits, which is 2^15 or 32768 integers per block. This first tier is composed of an array of objects indexed 0...131071, with the elements initialized as emtpy / undefined. Once bits are set within the range of the first tier element, then tier1 becomes an object with a count indicating the number of bits set in the block, and a reference to a second tier bit map array tracking the set of integers in the tier1 range. Since 32768 / 32 (the number of bits per Uint32) is 1024, then the second tier bit map is a Uint32Array(1024), sufficient to represent a block of 32768 integers.

Generally it is expected that one will have a large number of integers to track (otherwise a linear search will likely be faster), but by example, let's say the integers to track are...

[ 0, 1, 2, 1000000, 1000000000 ]

...in which case the representation of the first and second tiers will be...

tier1[ 0 ] = { count: 3, tier2: [ 7, 0, 0, ..., 0 ] }   // Range 0 - 32767
tier1[ 1 ] = undefined                                  // Range 32768 - 65535
tier1[ 2 ] = undefined                                  // Range 65536 - 98304
        o
        o
        o
tier1[ 29 ] = undefined                                           // Range 950272 - 983039
tier1[ 30 ] = { count: 1, tier2: [ 0, 0, 0, ..., 512, ..., 0 ] }  // Range 983040 - 1015807
tier1[ 31 ] = undefined                                           // Range 1015808 - 1048575

        o
        o
        o
tier1[ 30516 ] = undefined                                          // Range 999948288 - 999981055
tier1[ 30517 ] = { count: 1, tier2: [ 0, 0, 0, ..., 32 ..., 0 ] }   // Range 999981056 - 1000013823
tier1[ 30518 ] = undefined                                          // Range 1000013824 - 1000046591
        o
        o
        o
tier1[ 131071 ] = undefined                                         // Range 4294934528 - 4294967295

Now, to determine whether a specific integer is part of the entire set is simply a matter of calculating the tier1 index, and if not undefined, then look into the tier2 bit map.

By example, when seeking 1000000, the tier1 index is calculated by 1000000 / 32768 (as their are 32768 integers per block) which rounds to index 30. Then the associated tier1[ 30 ] bit index of tier2 is sought for 1000000 - 983040 or bit 16960, and that bit value is returned (in this case it will be true).

This still provides direct access to an integer within a set (without having to scan an array) by calculating two tiers of indexes, and accommodates sparse integer sets, particularly since the tier1 array will leverage the JavaScript built in sparse array feature...

Note that the size of the blocks (in this example, 32768 integers) can be adjusted to accommodate the expected spread and density of the data, in order to balance memory utilization.

Trentium
  • 3,419
  • 2
  • 12
  • 19
  • Good idea! Not general enough but for some use case this is definitely a valid solution. – hackape Apr 17 '21 at 13:11
  • A bit observation: if the given set is evenly distributed and dense, this encoding can be viewed as compression. However if given set is sparse in the middle and dense at both end, like `[0, 1, 2, 10000, 100001]`, then this encoding would actually takes up more space. – hackape Apr 17 '21 at 13:16
  • In most low level languages, true/false is implemented as bit data structure. What this method do is literally define a Boolean array storing a true/false value for each number, which is not really the “bit-wise solution” asked. From algorithm point of view, this method is not a good solution because it may consume unnecessary memory space if the range of the numbers are not so small. – coderLMN Apr 17 '21 at 13:57
  • 1
    @hackape concur regarding a sparse set consuming memory. The source of the question is a code challenge, and therefore somewhat theoretical in nature. The specific use case will depend on a real problem being solved, and the factors you brought up will have to be taken into consideration. For instance, it might be worthwhile to consume more memory if there is a need for an extremely fast means of determining whether an integer is in a set. Otherwise, you are correct, the memory consumption might not be worth the performance gain and one falls back to a linear search or Set object. – Trentium Apr 17 '21 at 14:17
  • @coderLMN the question asked "if a solution using bitmask exists to this problem"... I don't agree that the method is not a good solution. It is *a* solution to the problem at hand given the known assumptions, and can have applicability to real world problems even at the cost of memory. Is it the right solution for all problems involving a set of integers? No. But it is a solution to consider when confronted with requiring a highly performant lookup of a finite set of integers, which is central to the question... – Trentium Apr 17 '21 at 14:25
  • The PO asks for a possible solution “...where givenSet consists of arbitrary positive integer that fits into Int32”, so this solution need 4G bit only to store the indicators. I don’t think it can be called a good solution by any standard. – coderLMN Apr 17 '21 at 14:39
  • @coderLMN Blame me for my original wording being misleading. I made the assumption that this could be solved using bitmask. Thanks to your input, now we know it's impossible, at least not with one Int32 bitmask. So I've updated to loosen up the limit, all interesting thoughts that don't use iteration are welcomed. – hackape Apr 17 '21 at 15:10
  • @hackape the answer has been updated to address a solution that breaks down the set of integers into discreet blocks of bit maps, with the idea in mind of minimizing memory utilization. It leverages JS's sparse array feature, and heck, one could simply use this JS feature to manage a sparse array of 32-bit bitmaps, achieving the same result with simpler code... – Trentium Apr 17 '21 at 20:01
  • 1
    Love the updated solution. Thank you for the time and effort! This solution resembles how memory paging work. So we go from real mode to protected mode – hackape Apr 18 '21 at 01:53
  • Quick question, I don’t see `count` put into use. Is it safe to remove it? – hackape Apr 18 '21 at 01:54
  • 1
    @hackape `count` is simply a count of the number of bits set in `tier2`. When clearing bits, if `count` reaches zero, then `teir1` can be set to undefined. Otherwise, every time you clear a bit, the entire tier2 array will have to be checked to see if any bits are set. Of course, maybe this doesn't matter, as you'll still get the same result (ie, false) when checking for integers in that `tier1` range, as the entire `tier2` bitmap array with then be zeros... So yes, safe to remove `count`, if okay with `tier2` possibly having no bits set after bit setting and clearing operations... – Trentium Apr 18 '21 at 02:34
0

This should do it:

const CREATE_BITMASK = x => new Set(x);
const JUDGEMENT = (x,s) => s.has(x);

Quote from MDN:

In particular, it is, on average, faster than the Array.prototype.includes method when an Array object has a length equal to a Set object's size

Mårten Wikström
  • 11,074
  • 5
  • 47
  • 87