1

I have > 50GB array, stored in a file. Too big for my RAM. The file has the simple binary format uint64_t, uint64_t, uint64_t. So i have random access on the > 2.000.000.000 elements.

Now I need to sort them by the first and second element (two seperate files). Is there any trick to speed up the sorting? I have 16GB RAM so maybe loading chunks and sort thema separatly? But I down know any algorithm that does this.

Spide
  • 89
  • 5
  • 1
    Looks like a good job for a [merge sort](https://en.wikipedia.org/wiki/Merge_sort) I would say. – Holt Mar 21 '16 at 13:40
  • 1
    And, given enough swap space, it may even work out of the box if you use memory mapping of the file and try an in-place sort. – marom Mar 21 '16 at 13:43
  • 5
    In case you want to learn more, you might want to start here: https://en.wikipedia.org/wiki/External_sorting – pzelasko Mar 21 '16 at 13:44
  • related: http://stackoverflow.com/questions/4012523/sorting-huge-number-of-integers-from-hard-disk – Jaa-c Mar 21 '16 at 13:45
  • OK en.wikipedia.org/wiki/External_sorting was what I looked for. Thank you! – Spide Mar 21 '16 at 13:51
  • Read Knuth https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming – Slava Mar 21 '16 at 13:51

0 Answers0