If the range of positive integers is within a reasonable maximum value and repeated values are not a factor, then one can make use of a typed array (eg, a Uint32Array) whereby the index of the bit within this array represents an integer value.
Eg, if bit 12 is set, then the number 12 is part of your set. If bit 320 is set (ie, the 0 bit of the 10th element in the Uint32Array), then the number 320 is part of the set. In this way, one can directly determine whether a number is part of a set by indexing the array rather than scanning the array.
The code below includes a class that, given a maximum integer value, defines a bit array representing a set of integers. Method setBit
adds or removes an integer into the set, and method getBit
determines whether an integer is in the set of integers.
class BitArray {
constructor( maxN ) {
this.max = maxN;
this.bitArray = new Uint32Array( ( maxN / 32 |0 ) + 1 );
}
getBit( n ) {
if ( 0 <= n && n <= this.max ) {
return 0 < ( this.bitArray[ n / 32 |0 ] & ( 1 << ( n % 32 ) ) );
}
return NaN;
}
setBit( n, val ) {
if ( 0 <= n && n <= this.max ) {
if ( val ) {
this.bitArray[ n / 32 |0] |= 1 << ( n % 32 );
} else {
this.bitArray[ n / 32 |0 ] &= ~( 1 << ( n % 32 ) );
}
}
}
}
let ints = new BitArray( 100 );
[9, 91, 20, 18, 17, 37].forEach( n => ints.setBit( n, true ) );
console.log( `5: ${ ints.getBit( 5 ) }` );
console.log( `37: ${ ints.getBit( 37 ) }` );
console.log( `91: ${ ints.getBit( 91 ) }` );
console.log( `500: ${ ints.getBit( 500 ) }` );
EDIT Sparse Arrays with Large Integers in the Set of Integers
As a follow up to some of the comments regarding memory usage, one solution is to have a two tiered lookup, with the first tier implemented as a sparse array. The first tier is an array that returns the number of integers in a block of integers and a reference to the second tier which is the aforementioned bit map array in the original answer.
By example, let's say the first tier maps integers by blocks of 15 bits, which is 2^15 or 32768 integers per block. This first tier is composed of an array of objects indexed 0...131071, with the elements initialized as emtpy / undefined. Once bits are set within the range of the first tier element, then tier1 becomes an object with a count
indicating the number of bits set in the block, and a reference to a second tier bit map array tracking the set of integers in the tier1 range. Since 32768 / 32 (the number of bits per Uint32) is 1024, then the second tier bit map is a Uint32Array(1024)
, sufficient to represent a block of 32768 integers.
Generally it is expected that one will have a large number of integers to track (otherwise a linear search will likely be faster), but by example, let's say the integers to track are...
[ 0, 1, 2, 1000000, 1000000000 ]
...in which case the representation of the first and second tiers will be...
tier1[ 0 ] = { count: 3, tier2: [ 7, 0, 0, ..., 0 ] } // Range 0 - 32767
tier1[ 1 ] = undefined // Range 32768 - 65535
tier1[ 2 ] = undefined // Range 65536 - 98304
o
o
o
tier1[ 29 ] = undefined // Range 950272 - 983039
tier1[ 30 ] = { count: 1, tier2: [ 0, 0, 0, ..., 512, ..., 0 ] } // Range 983040 - 1015807
tier1[ 31 ] = undefined // Range 1015808 - 1048575
o
o
o
tier1[ 30516 ] = undefined // Range 999948288 - 999981055
tier1[ 30517 ] = { count: 1, tier2: [ 0, 0, 0, ..., 32 ..., 0 ] } // Range 999981056 - 1000013823
tier1[ 30518 ] = undefined // Range 1000013824 - 1000046591
o
o
o
tier1[ 131071 ] = undefined // Range 4294934528 - 4294967295
Now, to determine whether a specific integer is part of the entire set is simply a matter of calculating the tier1 index, and if not undefined, then look into the tier2 bit map.
By example, when seeking 1000000, the tier1 index is calculated by 1000000 / 32768 (as their are 32768 integers per block) which rounds to index 30. Then the associated tier1[ 30 ]
bit index of tier2
is sought for 1000000 - 983040 or bit 16960, and that bit value is returned (in this case it will be true
).
This still provides direct access to an integer within a set (without having to scan an array) by calculating two tiers of indexes, and accommodates sparse integer sets, particularly since the tier1 array will leverage the JavaScript built in sparse array feature...
Note that the size of the blocks (in this example, 32768 integers) can be adjusted to accommodate the expected spread and density of the data, in order to balance memory utilization.