A typical algorithm to sort 232 numbers would be:
- Create an array of 232 numbers and fill them from 0 to 232-1
- Let n = number of items in the array = 232
- Randomly pick a number from 0 to n-1, remove number out of the array, and push it onto a stack
- Now, n is decremented by 1 and stack size is incremented by 1
- goto 3. until array is empty, the final stack is the solution
232 = 4,294,967,296 items
232 * 4 = 17,179,869,184 bytes, if we use 4 byte unsigned ints
Since I don't have that much memory on one machine, using memmap() might be a good candidate (probably the most straight-forward approach).
Out of curiosity, I was wondering if I can use MapReduce to solve this problem? What would the Map and Reduce functions look like?
This idea crossed my mind, because although I don't have that much memory on one machine, I definitely have that much memory in all the boxes I have on the LAN. The distributed nature of the data in MapReduce might help.
Although alternative, equivalent algorithms which fit MapReduce are welcome, it may be difficult to come up with one which doesn't degrade the randomness of the above algorithm.