1

I have a huge csv file with 720,000,000 (720 million) lines. I want to sort it and my command is:

sort -T /tmp -S 50% --parallel=4 file.csv -o file_sorted.csv

Is there any other option that I can use to make it really fast?

Thanks!

Komal Rathi
  • 4,164
  • 13
  • 60
  • 98
  • This site is for programming questions. We are not general software tech support. – Marc B Sep 07 '16 at 18:55
  • I don't know what you consider "really fast." Sorting 720 million items, you're looking at around 20 biillion comparisons. You're also looking at reading and writing that file twice. If the lines are 50 characters long, you're talking 36 gigabytes, twice, or on the order of two hours of file I/O. – Jim Mischel Sep 08 '16 at 12:50

1 Answers1

0

Use parallel sorting algorithms for huge data.

Useful topic: Which parallel sorting algorithm has the best average case performance?

Community
  • 1
  • 1
Vural
  • 8,666
  • 11
  • 40
  • 57