11

I am curious why the following throws an error message (text reader closed exception) on the "last" assignment:

IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);
IEnumerator<string> textEnumerator = textRows.GetEnumerator();

string first = textRows.First();
string last = textRows.Last();

However the following executes fine:

IEnumerable<string> textRows = File.ReadLines(sourceTextFileName);

string first = textRows.First();
string last = textRows.Last();

IEnumerator<string> textEnumerator = textRows.GetEnumerator();

What is the reason for the different behavior?

asheeshr
  • 4,088
  • 6
  • 31
  • 50
Matt
  • 7,004
  • 11
  • 71
  • 117
  • Actually, both codes crash on my machine... – digEmAll Dec 26 '12 at 11:02
  • @digEmAll, the second works fine for me, the first code breaks when I try to determine the last line in the text file. – Matt Dec 26 '12 at 11:04
  • @digEmAll: That's strange - the second code works fine for me, and I understand *why* it works fine. What problem are you seeing, and where? – Jon Skeet Dec 26 '12 at 11:15
  • @JonSkeet, second code fails for me as well, with the same error and on the same line. – Andrei Dec 26 '12 at 11:19
  • @Andrei: Hmm. Which version of .NET are you using? Is this within a debugger? – Jon Skeet Dec 26 '12 at 11:20
  • @JonSkeet, framework version is 4. Client profile if it matters. Error appears both with and without debugging. – Andrei Dec 26 '12 at 11:24
  • @Andrei: Odd. I'm using .NET 4.5, but I'm surprised to see a difference. Hmm. – Jon Skeet Dec 26 '12 at 11:33
  • @JonSkeet: I'm using .net 4 as well (4.5 not installed on my PC), and looking at the decompiled code I'm not surprised that both do not work... maybe something has changed in 4.5 ... – digEmAll Dec 26 '12 at 12:05

1 Answers1

12

You've discovered a bug in the framework, as far as I can tell. It's reasonably subtle, because of the interaction of a few things:

  • When you call ReadLines(), the file is actually opened. Personally, I think of this as a bug in itself; I'd expect and hope that it would be lazy - only opening the file when you try to start iterating over it.
  • When you call GetEnumerator() the first time on the return value of ReadLines, it will actually return the same reference.
  • When First() calls GetEnumerator(), it will create a clone. This will share the same StreamReader as textEnumerator
  • When First() disposes its clone, it will dispose of the StreamReader, and set its variable to null. This doesn't affect the variable within the original, which now refers to a disposed StreamReader
  • When Last() calls GetEnumerator(), it will create a clone of the original object, complete with disposes StreamReader. It then tries to read from that reader, and throws an exception.

Now compare this with your second version:

  • When First() calls GetEnumerator(), the original reference is returned, complete with open reader.
  • When First() then calls Dispose(), the reader will be disposed and the variable set to null
  • When Last() calls GetEnumerator(), a clone will be created - but because the value it's cloning has a null reference, a new StreamReader is created, so it's able to read the file with no problems. It then disposes of the clone, which closes the reader
  • When GetEnumerator() is called, a second clone of the original object, opening yet another StreamReader - again, no problems there.

So basically, the problem in the first snippet is that you're calling GetEnumerator() a second time (in First()) without having disposed of the first object.

Here's another example of the same problem:

using System;
using System.IO;
using System.Linq;

class Test
{
    static void Main()
    {
        var lines = File.ReadLines("test.txt");
        var query = from x in lines
                    from y in lines
                    select x + "/" + y;
        foreach (var line in query)
        {
            Console.WriteLine(line);
        }
    }
}

You could fix this by calling File.ReadLines twice - or by using a genuinely lazy implementation of ReadLines, like this:

using System.IO;
using System.Linq;

class Test
{
    static void Main()
    {
        var lines = ReadLines("test.txt");
        var query = from x in lines
                    from y in lines
                    select x + "/" + y;
        foreach (var line in query)
        {
            Console.WriteLine(line);
        }
    }

    static IEnumerable<string> ReadLines(string file)
    {
        using (var reader = File.OpenText(file))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                yield return line;
            }
        }
    }
}

In the latter code, a new StreamReader is opened each time GetEnumerator() is called - so the result is each pair of lines in test.txt.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thanks for the detailed explanation. Quite interesting, motivates me to dig a lot deeper under the hood as I did not know that First() or Last() disposes the internal iterator/enumerator. – Matt Dec 26 '12 at 12:20
  • 2
    @Freddy: *Everything* which uses an `IEnumerator` should dispose of it. – Jon Skeet Dec 26 '12 at 12:27
  • I see your point but in this case calling First() and Last() I wished it would not dispose of the underlying text reader. In that I am still a bit confused, on one side one has to manually dispose, on the other hand the object is disposed itself after calling First() or Last()... – Matt Dec 26 '12 at 12:38
  • @JonSkeet: I just had a look at .net 4.5 source code. They changed the yield-based code with an IEnumerator implementation trying to keep the same behaviour. Anyway, I've copied the code in a .net 4.0 project and the 2nd case works, while using .net 4 File.ReadLines it does not... – digEmAll Dec 26 '12 at 20:58
  • @Freddy: The problem isn't the disposal - it's that calling `GetEnumerator` should *always* create a new reader IMO. – Jon Skeet Dec 26 '12 at 22:33
  • @JonSkeet, yes in that case everything would be clear. A new enumerator would create a new reader, while calling First(), Last() would not create nor dispose. – Matt Dec 27 '12 at 03:32
  • 1
    @Freddy: No, calling `First()` or `Last()` would create (via `GetEnumerator()`), then read, then dispose. – Jon Skeet Dec 27 '12 at 08:32
  • @JonSkeet, ok got it, Last() or First() are methods after all, if they were properties or fields I would find it odd that a new reader was created. Would it not be much better to create an enumerator and hence a reader when an IEnumerable is instantiated, given the enumerator enumerates a stream? Then accessing the enumerator through Last() or First() or whatever would not interfere with the underlying stream. – Matt Dec 27 '12 at 11:00
  • @Freddy: Well it would *have* to interfere with the underlying stream - otherwise it couldn't get the data. But I would say that the enumerator enumerates the *file*, not a *stream*. This is the approach that LINQ takes all over the place - the query is just the representation of the query; no data is fetched (and GetEnumerator isn't called) until it's required. The whole *problem* here is that the reader is being created earlier than it should, IMO. If each call to GetEnumerator created a new reader, all would be well. – Jon Skeet Dec 27 '12 at 11:07
  • @JonSkeet, fair points, though would not concurrent readers pointing to the same open file cause problems? – Matt Dec 27 '12 at 11:34
  • @Freddy: Nope - see the code at the end of my answer for an example which works fine. – Jon Skeet Dec 27 '12 at 11:41
  • @JonSkeet, sorry about my wording, I was aware that it can work I guess I wanted to say that I would find a design where the reader sits with IEnumerable would make a lot more sense to me. The reader would only be created when GetEnumerator is called for the first time. Subsequent usage of IEnumerator would then not cause the creation of another reader. If I compared that with moving through a binaryStream just because I would set the stream position with Seek() does not mean I need to create a new binary reader or writer. Similarly with getting the first and last element of a IEnumerable... – Matt Dec 27 '12 at 14:10
  • ...it's hard to comprehend for me why we would need several readers. At least I don't see the benefit because GetEnumerator() is sharing a relationship with IEnumerable. I am obviously aware the current design is different. Apologies for messing up the lingo at times I am a self taught programmer and that just since 1.5 years now – Matt Dec 27 '12 at 14:13
  • @JonSkeet, thanks for the explanations, I can learn a lot from you. – Matt Dec 27 '12 at 14:27
  • @Freddy: No, each call to `GetEnumerator()` should create an *independent* iterator. So if you call `GetEnumerator()` multiple times, it should create multiple "cursors" through the data, whatever the data source is. That's why `IEnumerator` exists in the first place. – Jon Skeet Dec 27 '12 at 16:14
  • @JonSkeet, I see that but I have a hard time equating a text reader with a simple construct such as a "cursor"/iterator. To set up a comparison, a cursor/iterator for me is like a bookmark, a book is a data source. Why cloning/duplicating the books when a book can be read by one entity at a time anyway (due to I/O constraints here). Just because I tell the cursor/iterator to go somewhere why does that require setting up a new reader, makes very little to no sense to me. – Matt Dec 27 '12 at 16:39
  • @JonSkeet, yes I agree with your points but what I disagree with is why a new reader is created for each new enumerator. "Whatever the data source" -> Fully concur, then why having to create a full reading device (reader) when just trying to iterate over the same data source. But it gets into a discussion, it remains my opinion that one and exactly one reader should be created regardless of how many Iterators I "get" , how it could be implemented and whether it will eliminate all the pitfalls, one of which I fell into, I dont know. Thanks again for your deep analysis and code. Happy New Year. – Matt Dec 29 '12 at 06:35
  • @Freddy: To conform with normal expected behavior, something like `File.ReadLines()` should either return an object that creates a new reader with each call to `GetEnumerator()`, or else it should create an object whose enumerators have the logic to share a reader *in thread-safe fashion* and clean it up when everyone is done with it. The former approach is much simpler. – supercat Mar 12 '13 at 20:05