11

I have 20GB+ csv file like this:

**CallId,MessageNo,Information,Number** 
1000,1,a,2
99,2,bs,3
1000,3,g,4
66,2,a,3
20,16,3,b
1000,7,c,4
99,1,lz,4 
...

I must order this file by CallId and MessageNo as asc. (One way is load database->sort->export)

How can i sort this file without loading all lines to memory in c#? (like line by line using streamreader)

Do you know a library for solution? i wait your advice, thanks

oguzh4n
  • 682
  • 1
  • 10
  • 29

3 Answers3

7

You should use OS sort commands. Typically it's just

sort myfile

followed by some mystical switches. These commands typically work well with large files, and there are often options to specify temporary storage on other physical harddrives. See this previous question, and the Windows sort command "man" page. Since Windows sort is not enough for your particular sorting problem, you may want to use GNU coreutils which bring the power of linux sort to Windows.

Solution

Here's what you need to do.

  1. Download GNU Coreutils Binaries ZIP and extract sort.exe from the bin folder to some folder on your machine, for example the folder where your to-be-sorted file is.
  2. Download GNU Coreutils Dependencies ZIP and extract both .dll files to the same folder as sort.exe

Now assuming that your file looks like this:

1000,1,a,2
99,2,bs,3
1000,3,g,4
66,2,a,3
20,16,3,b
1000,7,c,4
99,1,lz,4 

you can write in the command prompt:

sort.exe yourfile.csv -t, -g

which would output:

20,16,3,b
66,2,a,3
99,1,lz,4
99,2,bs,3
1000,1,a,2
1000,3,g,4
1000,7,c,4

See more command options here. If this is what you want, don't forget to provide an output file with the -o switch, like so:

sort.exe yourfile.csv -t, -g -o sorted.csv
Community
  • 1
  • 1
Gleno
  • 16,621
  • 12
  • 64
  • 85
  • In which OS sort command able to parse CSV file and sort by a particular feld? – sll Sep 09 '11 at 12:00
  • To my knowledge Linux; I think Windows too. – Gleno Sep 09 '11 at 12:02
  • 1
    I believe there is small possibility that developer asked C# question using Linux, anyway if you 100% sure regarding linux and not sure regarding Windows you should indicate this in your answer. This is my point of view. – sll Sep 09 '11 at 12:05
  • Good point, but you can use Linux sort in window with a few pains. – Gleno Sep 09 '11 at 12:12
  • I know you can sort a positional byte/column number, but What what if you want to sort the third field? Are there switches to do that? – NealWalters Mar 02 '12 at 13:39
  • This is not really a portable pure C# way of solving the problem. I think the [external sorting](http://stackoverflow.com/a/7361224/429091) answer points to a better more portable solution. – binki Dec 30 '15 at 16:19
3

This is a classical algorithm problem called External Sorting.

External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted subfiles are combined into a single larger file

From .NET Framework point of view I would recommend to leverage .NET 4 feature - Memory Mapped Files to project parts of the file in memory as separate views.

Here is an Java example of External Merge Sort, you should be able to adopt it to C# easily:

EDIT: Added usage example of the mentioned Java sample to demonstrate its simplicity

Comparator<String> comparator = new Comparator<String>() 
{                         
  public int compare(String r1, String r2)
  {                                 
     return r1.compareTo(r2);
  }
};                 

List<File> l = sortInBatch(new File(inputfile), comparator);                
mergeSortedFiles(l, new File(outputfile), comparator); 
sll
  • 61,540
  • 22
  • 104
  • 156
-3

you should use python for this kind of tasks :)

have a look here for a similar, full working example:

Python: How to read huge text file into memory

EDIT:

in that same answer there is a link useful in case your file is really way bigger than available amount of RAM: http://code.activestate.com/recipes/466302/

Community
  • 1
  • 1
Davide Piras
  • 43,984
  • 10
  • 98
  • 147
  • 2
    i cannot load 24GB+ file to memory (is memory issue) – oguzh4n Sep 09 '11 at 11:46
  • Oguzh4n, read that answers, there is also a link to an article on how to do it when file much bigger than available RAM: http://code.activestate.com/recipes/466302/ – Davide Piras Sep 09 '11 at 11:49