4

I came across an implementation of reading a file line by line that looks like this:

using (StreamReader reader = File.OpenText(path))
while (!reader.EndOfStream)
{
    string line = reader.ReadLine();
}

However personally I would do just this:

foreach (string line in File.ReadLines(path))
{

}

Is there any reason to pick one over the other?

Ian
  • 30,182
  • 19
  • 69
  • 107
Tyress
  • 3,573
  • 2
  • 22
  • 45
  • slightly different but perhaps also check out: http://stackoverflow.com/questions/3545402/any-difference-between-file-readalltext-and-using-a-streamreader-to-read-file. – wazz Jan 21 '16 at 06:47

2 Answers2

5

Objectively:

  • The first one is picking the line one by one and you do the process as you stream the data.

  • The second one generates IEnumerable<string> at once and then you can start processing the lines (Source MSDN - my apologize to mix it with ReadAllLines at first).

Usefulness?:

  • The first one is more "fluid" in that sense. Because you can choose to take/process and break/continue at will. Everytime you need a line, you take one, you process it and then you choose to break or continue. You can even choose to not take any further line (suppose you have yield in the while loop) till you want to come back at will
  • The second one would get IEnumerable<string> (that is, to specify the info of the expected data before process) thus giving slightly overhead at first. It is to be noted, however, that this overhead is relatively small as IEnumerable defer the actual execution, unlike List. Some good info to read.

And from MSDN The following are the use cases for ReadLines:

Perform LINQ to Objects queries on a file to obtain a filtered set of its lines.

Write the returned collection of lines to a file with the File.WriteAllLines(String, IEnumerable<String>) method, or append them to an existing file with the

File.AppendAllLines(String, IEnumerable<String>) method. Create an immediately populated instance of a collection that takes an IEnumerable<T> collection of strings for its constructor, such as a IList<T> or a Queue<T>.

Community
  • 1
  • 1
Ian
  • 30,182
  • 19
  • 69
  • 107
  • 1
    The second one actually does not load all lines at once, it returns an `IEnumerable` - and the documentation explicitly says so (`File.ReadAllLines` would behave as you mentioned) - https://msdn.microsoft.com/en-us/library/dd383503(v=vs.110).aspx#Anchor_2 – Rob Jan 21 '16 at 05:51
  • I actually did not know that IEnumerable had overhead at all pre iteration (does it really?) – Tyress Jan 21 '16 at 06:16
  • @Tyress as far as I know, it is not as bad as `ToList()` – Ian Jan 21 '16 at 06:18
  • 1
    @Tyress I personally have never encountered performance defect as much. But some sources say that when the number of IEnumerable items is high, the impact can be seen.. perhaps this is one of them: http://www.tomfosdick.com/archives/704 – Ian Jan 21 '16 at 06:20
  • @Tyress you are welcome. :) While `List()` will create an instance for the item, `IEnumerable` only has overhead of giving the info about the item. Thus, it cannot be as bad as `List()`. However, as compared to the real `Stream` there would be slight overhead on it... That's what I understand basically. Other good posts to read: http://stackoverflow.com/questions/15516462/is-there-a-performance-impact-when-calling-tolist http://stackoverflow.com/questions/3628425/ienumerable-vs-list-what-to-use-how-do-they-work – Ian Jan 21 '16 at 06:29
  • Joining to the above commenters. The methods in OP are equivalent, hence the later (`File.ReadLines`) is preferable. `File.ReadAllLines` is a totally different story. – Ivan Stoev Jan 21 '16 at 07:10
  • @IvanStoev would like to hear your input. :) Who knows it will be enriching – Ian Jan 21 '16 at 07:11
  • @IvanStoev as far as I understand, the `IEnumerable` has little overhead due to it needs to get the info about the item before doing anything. While Stream does not seem to (I could be wrong though). Is this not the case? – Ian Jan 21 '16 at 07:15
  • Of course there is, but it's insignificant compared to the actual read operation and string creation. Also if we count enumerable overhead, how about all LINQ chains of enumerables? The second method is just encapsulation of the first. Put the first code snippet in a method returning `IEnumerable`, change `line = ...` to `yield return ...`, and basically you get `File.ReadLines` method (simplified) – Ivan Stoev Jan 21 '16 at 07:20
  • @IvanStoev I see... so you are saying the significant time consumption is a lot more in the read operation and the string creation than to put up info. This is quite new to me... :/ However, isn't that when the number of items is larger, than the `IEnumerable` takes longer time to create the info? Or is it actually independent from the number of items? Because I myself, as I said, has never encountered any performance down so badly using `IEnumerable` that makes me benchmark its performance... – Ian Jan 21 '16 at 07:26
  • 3
    The only overhead of enumerable is a single additional heap allocation (one time), and then 2 virtual calls (`MoveNext` and `Current`) per iteration. Nowadays one must have a very special requirements to count such things :) – Ivan Stoev Jan 21 '16 at 07:35
  • 1
    Anyway, it's sort of an academic discussion. All you wrote in the initial post fully applies to `File.ReadAllLines` which reads the whole file upfront. – Ivan Stoev Jan 21 '16 at 07:38
  • @IvanStoev I see... :) Thanks. It is indeed enriching. – Ian Jan 21 '16 at 08:15
1

Method 1

Is more verbose than method 2. So more chance of things being stuffed up.

Method 2

Is more concise and encapsulates the intent more directly - much easier to read and follow also less chance of stuffing things up (like forgetting to dispose of the reader etc).

Jens Meinecke
  • 2,904
  • 17
  • 20