The size parameter is needed because the original question was to produce subsets of a minimal size (of 2). The parameter allows you to specify that minimal size. This is nice, because you can then use the same code to get all subsets with a minimal size of 3.
The variable mask takes all integer values that can be made with input.length bits. So, represented in binary, mask takes these values:
000000
000001
000010
000011
000100
... etc...
111100
111101
111110
111111
Every 1 bit signals that the corresponding element from the input array should be part of the subset. So the above values of mask represent the following subsets:
[]
['a']
['b']
['a','b']
['c']
... etc ...
['c','d','e','f']
['a','c','d','e','f']
['b','c','d','e','f']
['a','b','c','d','e','f']
It is a bit confusing because we write binary representations with the least-significant bit on the right, while arrays are shown with the 0th element on the left.
From those sub arrays, only those with minimal size elements are retained in the results. So in the case of size = 2, the first 3 are not retained: the first one that stays is ['a','b']
.
For that reason the algorithm was a bit improved, and mask does not start from 0, because it already is sure not to find a large enough subset with mask smaller or equal to size. So that is why the for
loop starts with that value. But it is not that important. It would also work starting with zero:
for (mask = 0; mask < total; mask++)
Then concerning the if
: it tests whether a certain bit is set in the binary representation of mask: for that a single bit (1) is shifted to the left (so it becomes something like 10000 in binary representation) and then it is AND-ed (with &
) with mask. For instance, if mask is at a certain moment 101001 and the shifted bit is 10000 then this &
operation works like this:
mask: 101001
bit: 10000
------------
AND: 000000
Another example:
mask: 101001
bit: 1000
------------
AND: 001000
So, the if
condition tests whether the bit at position i in mask (starting at the right of the binary representation) is set or not. If it is set, the value mask & (1 << i))
will be equal to (1 << i)
(like for instance 1000), if it is not set this expression evaluates to 0. So by doing the !== 0
test you effectively test that the bit is set. Only in that case the corresponding element from the input array is added to the subset.
Example run
Let's say we call the function with these arguments:
(['a','b','c'], 2);
Then total = 8, and mask runs from 2 to 7. Put in binary terms, mask takes the following binary values:
010
011
100
101
110
111
Note that every bit ("column") in the above table corresponds to an element of the array, and its value (0 or 1) decides whether that element should be in the sub array.
Other values of mask are not interesting: smaller values (like binary 001 or 000) represent sub-arrays that have too few elements -- we need at least two. Higher values (like 1000, 1001, ...) have too many bits: the left-most bit does not correspond to any element in the input array, since we only have 3 elements in the array. So that means we have all the interesting values for mask in the above table.
The next thing that happens in the code, is finding the 1-bits in a particular value of mask. We use the variable i to denote the 0-based number of the bit (its position starting from the right of the binary representation). This i will start at 2 and decrease until 0 (so 2, 1, and 0: each pointing to one of the three bits in mask):
i = input.length - 1;
do { ... } while (i--);
For a given bit-number i, the bit's value in mask is extracted with:
mask & (1 << i)
So let's do all of the above with mask = 2 (i.e. 010 in binary). These are the values that are calculated:
result = []
i = 2
(1 << i) == 4 // 100 in binary: a 1-bit shifted twice to the left
mask & (1 << i) == 0 // because mask does not have that bit set
The next iteration for i:
i = 1
(1 << i) == 2 // 010 in binary: a 1-bit shifted once to the left
mask & (1 << i) == 2 // because mask has that bit set
result = ['b'] // because 'b' is the value at index i.
The last iteration for i:
i = 0
(1 << i) == 1 // 001 in binary: a 1-bit not shifted at all
mask & (1 << i) == 0 // because mask does not have that bit set
Now, the length of result is checked, but for this particular value of mask it is too small:
result.length == 1 // see above, so nothing is added to the *results* (with 's')
All has been done for mask == 2, so now the above is repeated for mask == 3 (binary 011):
result = []
i = 2
(1 << i) == 4 // 100 in binary: a 1-bit shifted twice to the left
mask & (1 << i) == 0 // because mask does not have that bit set
The next iteration for i:
i = 1
(1 << i) == 2 // 010 in binary: a 1-bit shifted once to the left
mask & (1 << i) == 2 // because mask has that bit set
result = ['b'] // because 'b' is the value at index i.
The last iteration for i (different result here):
i = 0
(1 << i) == 1 // 001 in binary: a 1-bit not shifted at all
mask & (1 << i) == 1 // because mask has that bit set as well (different!)
result = ['b','a'] // because 'a' is the value at index i.
Now, the length of result is checked again, and this time it is big enough:
result.length == 2 // see above
results = [['b','a']]
...etc.
Note that results (plural) is an array of arrays, so eventually it will grow to:
results = [['b','a'], ['c','a'], ['c','b'], ['c','b','a']]
The sub arrays have their elements in reversed order because the loop of i iterates backward. The sub arrays would be ordered normally if i would increment from 0 onwards (which is possible without any issue).
Limitations
Because bit operations like <<
and &
require their arguments to be 32 bit integers, the code at hand will only work for input arrays that have at the most 32 elements.