-3

What is the most optimal way to read in a very large text file? Should it be read all at once (ReadToEnd?) or Line by Line?. This may be related to: What's the fastest way to read a text file line-by-line?

using (StreamReader sr = new StreamReader("TestFile.txt"))
{
     String line = sr.ReadToEnd();
}

OR

System.IO.StreamReader file = 
   new System.IO.StreamReader("c:\\test.txt");
while((line = file.ReadLine()) != null)
{
    // append to StringBuilder
}

Should one store text file to string? or StringBuilder? What's the best approach?

Community
  • 1
  • 1
ShaneKm
  • 20,823
  • 43
  • 167
  • 296
  • 4
    `most optimal` <-- what is your optimization criteria? There is no best way to do anything. There is a subjective opinion on how to do certain task in a specific scenario. You need to describe your specific scenario. Start like this - I'm reading the file like this (code goes here), but I find it's too slow? or it takes too many lines of code, or it does not scale for large files, or some other developer said it looks bad etc., and please include relevant links if available. – Victor Zakharov Nov 22 '15 at 03:07
  • If getting all lines, I think File.ReadAllLines should be used, read all lines & close the file automatically. Exception handling must be done by ourself. – hazjack Nov 22 '15 at 03:11
  • please see edits (reading in large text file) – ShaneKm Nov 22 '15 at 03:15
  • 1
    The most optimal reading is not reading at all. Do not read anything. Zero CPU time, zero memory usage. If You need only part of the file, try to seek into that position. – Antonín Lejsek Nov 22 '15 at 03:23

2 Answers2

4

It's not about what's faster, but about what suits your needs.

If you don't need the entire file to be in memory at any given point of time reading it line by line will save you from unnecessary memory consumption. You can read a line, do something with it, and discard before reading the next one. At any given point in time only last line is stored in memory *.

If you do need the entire file content to be stored and accessible from memory later in the application reading the entire file might be better (unless you're going to split it by Environment.NewLine later on - in that case it might be better to read it line by line upfront).

* - a little simplification, StreamReader will keep some extra data in a temp buffer, to minimize number of times file content is actually read from disk.

MarcinJuraszek
  • 124,003
  • 15
  • 196
  • 263
0

I think that there is no difference between both cases. In term of time complexity, it will take O(n) with a file of n lines, since you basically store the file's content into an instance and use it in both cases. Just my opinion.

Update: It is matter if the file is too large, so you cannot do this way since it could consume a lot of memory. So, the best way is that you try to read each line from the file, and free-memory for this line before going to the next line. Thereby, it could possibly be better.

Tung Le
  • 109
  • 1
  • 3
  • 11
  • Your first paragraph is arguably incorrect. I/O is one thing but there is a _big_ difference in memory consumption. Reading a very large file into memory could well impact the loading of the remainder of the file, not to mention performance of other applications –  Nov 22 '15 at 04:20
  • You are right, thanks Micky. What I meant was about the computational complexity, but in reality, of course there is a difference in the running system. – Tung Le Nov 22 '15 at 05:16