14

In this web page: CS302 --- External Sorting

Merge the resulting runs together into successively bigger runs, until the file is sorted.

As I quoted, how can we merge the resulting runs together??? We don't have that much memory.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Josh Morrison
  • 7,488
  • 25
  • 67
  • 86

2 Answers2

46

Imagine you have the numbers 1 - 9

9  7  2  6  3  4  8  5  1

And let's suppose that only 3 fit in memory at a time.

So you'd break them into chunks of 3 and sort each, storing each result in a separate file:

279
346
158

Now you'd open each of the three files as streams and read the first value from each:

2 3 1

Output the lowest value 1, and get the next value from that stream, now you have:

2 3 5

Output the next lowest value 2, and continue onwards until you've outputted the entire sorted list.

Jesse Cohen
  • 4,010
  • 22
  • 25
  • thank u. But, please that website's example, it says "6 I/O units are used."...how??? If follow your way, it's a lot of IOs – Josh Morrison Feb 24 '11 at 04:17
  • well, it's the same principle, but you can't do it all at once, you'll have to partition, sort part and merge it all, then sort the rest of the partition, and merge that with the initially merge-sorted list. as such you won't use more than 6 i/o units. the algorithms on the page describe the best way to partition your data given your i/o unit constraints. – Jesse Cohen Feb 24 '11 at 04:31
  • That works for that example, but how about you have this list instead? 9 7 8 6 3 4 2 5 1, then you would end up with the following groups: 789, 346, 125. If you sort the first from each group, then the seconds and then the thirds, you would end up with something like this: 127248569, which is incorrect. So in a nutshell, I still don't get how it works... – Gaara Nov 04 '13 at 05:58
  • 1
    @Gaara How did u get 1272xxxx? 789 - 346 - 125.. pick first values from each stream, gives u [731].. print 1.. pick 2 from 3rd-stream, [732].. print 2.. pick 5 from 3rd-stream [735].. print 3 .. its proceeding in right way.. – everlasto Nov 17 '13 at 13:03
  • Got you, everlasto! I missunderstood how the algorithm works, but now I get it with your clear example. Thanks a lot! – Gaara Nov 18 '13 at 19:36
  • Can I do anything if i want to store the final merged list somewhere rather than printing each element? – Abhishek Jaiswal May 03 '21 at 10:01
1

If you process two runs A and B into some larger run C you can do this line-by-line generating progressively larger runs, but still only reading at most 2 lines at a time. Because the process is iterative and because you're working on streams of data rather than full cuts of data you don't need to worry about memory usage. On the other hand, disk access might make the whole process slow -- but it sure beats not being able to do the work in the first place.

Mark Elliot
  • 75,278
  • 22
  • 140
  • 160
  • bro, this is gonna be a little embarrassing, today I posted a question asked by an interviewer, hoping to get some help, but nobody answered it with being viewed 30+ times. I really hope that you can spare a little time helping me about it, here is the link: http://stackoverflow.com/q/7425400/888051 Thank you very much. – Alcott Sep 15 '11 at 05:03