Chunked Sorting/Writing
Suppose you want to keep N numbers in memory at a time and further suppose that you are given a function which writes the N sorted numbers to a file:
val N : Int = ???
val writeToFile : Seq[Int] => Unit = ???
As indicated in your question, an Iterator can be used to only keep N numbers in RAM at a time to sort them and write them to the intermediate files:
val sourceFileName : String = ???
val sortAndWrite : Seq[Int] => Unit =
(_.sorted) andThen writeToFile
Source
.fromFile(sourceFileName)
.getLines
.map(_.toInt)
.grouped(N)
.foreach(sortAndWrite)
Now you have each group of N numbers in a different file. The only thing left to do is merge the files together.
Merging
Given some reading function that returns the Iterators from each of the sub-files:
val subFiles : Iterable[Iterator[String]] = ???
We can write a function that will return a new Iterator that gets the values from each file and sorts them:
val mergeSort : Iterable[Iterator[String]] => Iterator[Int] =
(fileIterators) => {
val nonEmptyFiles = fileIterators filter (_.hasNext)
nonEmptyFiles
.map(_.next)
.map(_.toInt)
.sorted
.toIterator ++ mergeSort(nonEmptyFiles)
}
Note: the above function will keep a single Integer
in memory for each file, therefore the RAM usage is dependent on how many different files that writeToFile
created.
Now just write the values to a file:
val destinationFileName : String = ???
val writer : Writer = new FileWriter(destinationFileName)
mergeSort(subFiles) foreach (i => writer write i.toString)
Imperfect Sorting
One thing to note: if N is small and the source file is not sufficiently random then the solution will not produce a perfect sort. Example: suppose N = 2
and the initial list was [10,11,0,1]
then the algorithm, after one pass, would produce [0,10,1,11]
as the result.