In this web page: CS302 --- External Sorting
Merge the resulting runs together into successively bigger runs, until the file is sorted.
As I quoted, how can we merge the resulting runs together??? We don't have that much memory.
In this web page: CS302 --- External Sorting
Merge the resulting runs together into successively bigger runs, until the file is sorted.
As I quoted, how can we merge the resulting runs together??? We don't have that much memory.
Imagine you have the numbers 1 - 9
9 7 2 6 3 4 8 5 1
And let's suppose that only 3 fit in memory at a time.
So you'd break them into chunks of 3 and sort each, storing each result in a separate file:
279
346
158
Now you'd open each of the three files as streams and read the first value from each:
2 3 1
Output the lowest value 1
, and get the next value from that stream, now you have:
2 3 5
Output the next lowest value 2
, and continue onwards until you've outputted the entire sorted list.
If you process two runs A
and B
into some larger run C
you can do this line-by-line generating progressively larger runs, but still only reading at most 2 lines at a time. Because the process is iterative and because you're working on streams of data rather than full cuts of data you don't need to worry about memory usage. On the other hand, disk access might make the whole process slow -- but it sure beats not being able to do the work in the first place.