This is a solved problem where the we have to find a missing integer from the input data which is in form of a file containing 4 billion unsorted unsigned integers. The catch is that only 10 MBytes of memory can be used.
The author gives a solution where we do 2 passes. In the first pass we create an array of "blocks" of some size. Eg. Block 0 represents 0 to 999
, 1 represents 1000 to 1999
, so on. Thus we know how many values should be present in each block.
Now we scan through the file and count how many values are actually present in each block - which will lead us to the missing integer.
I understand the approach. However, to calculate the appropriate block size starts with this:
The array in the first pass can fit in 10 MBytes, or roughly 2^23 bytes, of memory.
How is 10
MB the same as 2^23
? I am unable to understand where did the 2^23
number come from.
So 2^24
is 16MBytes, so probably he is taking 2^23
which is closest value that is 10 MBytes or less. If that is the case, why can he not use the whole 10 MBytes? i.e. why does he have to use a power of 2 to get the size here?