I need to read line by line four very large (>2 Gb) files simultaneously on a C# application. I'm using four different StreamReader
objects and their ReadLine()
method. Perfomance is seriously affected while reading lines from the four files at the same time, but getting better as far as each one of them reaches the EoF (perf with 4 files < perf with 3 files < perf with 2 files...).
I have this (simplified, assuming only two files for a cleaner example) code:
StreamReader readerOne = new StreamReader(@"C:\temp\file1.txt");
StreamReader readerTwo = new StreamReader(@"C:\temp\file2.txt");
while(readerOne.Peek() >= 0 || readerTwo.Peek() >= 0)
{
string[] readerOneFields = readerOne.Peek() >= 0 ?
readerOne.ReadLine().Split(',') : null;
string[] readerTwoFields = readerTwo.Peek() >= 0 ?
readerTwo.ReadLine().Split(',') : null;
if (readerOneFields != null && readerTwoFields != null)
{
if (readerOneFields[2] == readerTwoFields[2])
{
// Do some boring things...
}
else if (readerOneFields != null)
{
// ...
}
else
{
// ...
}
}
readerOne.Close();
readerTwo.Close();
The reason why I have to read those files at the same time is because I need to do some stuff comparing those lines, and afterwards write the results to a new file.
I've read a lot of questions regarding large file reading using StreamReader, but I couldn't find a scenario like the one I have. It's using ReadLine()
method the proper way to accomplish that? Is it even the StreamReader
the proper class?
UPDATE: things are getting weirder now. Just for testing I've tried to reduce the file sizes to about 10 Mb by deleting lines, leaving only 70K records. Furthermore, I have tried with only two files (instead of four) at the same time. And I'm getting the same poor performance while reading from the two files simultaneously! When one of them reaches EoF, performance gets better. I'm setting a StreamReader buffer size of 50 MB.