28

I have a function which Skips n lines of code and Takes y lines from a given file using File.ReadLines with Skip and Take combination. When I try to open the file given by filePath the next time:

string[] Lines = File.ReadLines(filePath).Skip(0).Take(0).ToArray();
using (StreamWriter streamWriter = new StreamWriter(filePath))
{
    // ...
}

I get a File in use by another process exception on the "using" line.

It looks like IEnumerable.Take(0) is the culprit, since it returns an empty IEnumerable without enumerating on the object returned by File.ReadLines(), which I believe is not disposing the file.

Am I right? Should they not enumerate to avoid this kind of errors? How to do this properly?

Titus
  • 907
  • 8
  • 18
  • 3
    You *definitely* have the `ToArray` call? I'd expect that to dispose of the iterator, which should work its way up appropriately. Could you provide a [mcve]? (I would expect you to be able to show this all in a `Main` method.) – Jon Skeet Aug 25 '16 at 10:22
  • Yes, right, it is called. Remember it is called NOT on the File.ReadLines IEnumerable, but on the Take(0) returned IEnumerable – Titus Aug 25 '16 at 10:24
  • Yes, but I'd expect that to end up disposing of the original iterator. (It would be fairly easy to prove that using your own iterator method.) – Jon Skeet Aug 25 '16 at 10:25
  • @CodeCaster: What, exactly? A problem with `File.ReadLines`, or a more general problem with `Take(0)`? – Jon Skeet Aug 25 '16 at 10:26
  • Its actually fairly easy to reproduce it. Just copy paste the the ReadLines and using (StreamWriter...) line in console application, with 0 as argument to Take, you can reproduce it – Titus Aug 25 '16 at 10:27
  • `IEnumerable` doesn't inherit from IDisposable because typically, the class that implements it only gives you the *promise* of being enumerable, it hasn't actually done anything yet that warrants disposal. However, when you enumerate over it, you first retrieve an `IEnumerator` by calling the `IEnumerable.GetEnumerator` method, and typically, the underlying object you get back *does* implement `IDisposable`. For `File.ReadLines`, the file isn't opened until you start enumerating over it, so the object you get from `File.ReadLines` doesn't need disposing, but the enumerator you get, does. – Sebastian Siemens Aug 25 '16 at 10:28
  • Looking at the source, it appears Take(0) does not dispose of its source iterator: http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,ceb7d3d4f0a39720,references – Dennis_E Aug 25 '16 at 10:28
  • @Dennis_E: Actually, that looks okay - because the `foreach` loop does the right thing. I think it's actually `File.ReadLines` which is broken here. – Jon Skeet Aug 25 '16 at 10:29
  • @JonSkeet I'm sorry, but it says `if (count > 0)`. If count equals 0, the foreach loop is never run. Or am I missing something? – Dennis_E Aug 25 '16 at 10:32
  • @Dennis_E: Yes - you're missing the fact that there's nothing to dispose of until `foreach` calls `GetEnumerator`. An `IEnumerable` shouldn't need to be disposed of, whereas an `IEnumerator` *does* need to be disposed. If you don't call `GetEnumerator()`, it should be fine. – Jon Skeet Aug 25 '16 at 10:35
  • @JonSkeet Of course, that makes sense. dur – Dennis_E Aug 25 '16 at 10:37

3 Answers3

40

This is basically a bug in File.ReadLines, not Take. ReadLines returns an IEnumerable<T>, which should logically be lazy, but it eagerly opens the file. Unless you actually iterate over the return value, you have nothing to dispose.

It's also broken in terms of only iterating once. For example, you should be able to write:

var lines = File.ReadLines("text.txt");
var query = from line1 in lines
            from line2 in lines
            select line1 + line2;

... that should give a cross-product of lines in the file. It doesn't, due to the brokenness.

File.ReadLines should be implemented something like this:

public static IEnumerable<string> ReadLines(string filename)
{
    return ReadLines(() => File.OpenText(filename));
}

private static IEnumerable<string> ReadLines(Func<TextReader> readerProvider)
{
    using (var reader = readerProvider())
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

Unfortunately it's not :(

Options:

  • Use the above instead of File.ReadLines
  • Write your own implementation of Take which always starts iterating, e.g.

    public static IEnumerable<T> Take<T>(this IEnumerable<T> source, int count)
    {
        // TODO: Argument validation
        using (var iterator = source.GetEnumerator())
        {
            while (count > 0 && iterator.MoveNext())
            {
                count--;
                yield return iterator.Current;
            }
        }
    }
    
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
19

From the comment above File.ReadLines() in the Reference Source, it becomes obvious that the team responsible knew about this "bug":

Known issues which cannot be changed to remain compatible with 4.0:

  • The underlying StreamReader is allocated upfront for the IEnumerable<T> before GetEnumerator has even been called. While this is good in that exceptions such as DirectoryNotFoundException and FileNotFoundException are thrown directly by File.ReadLines (which the user probably expects), it also means that the reader will be leaked if the user never actually foreach's over the enumerable (and hence calls Dispose on at least one IEnumerator<T> instance).

So they wanted File.ReadLines() to throw immediately when passed an invalid or unreadable path, as opposed to throwing when enumerating.

The alternative is simple: not calling Take(0), or instead not reading the file altogether if you aren't actually interested in its contents.

CodeCaster
  • 147,647
  • 23
  • 218
  • 272
-1

In my opinion, the root cause is Enumerable.Take iterator doesn't dispose an underlying iterator if the count is zero, since the code doesn't enter the foreach loop - see referencesource. If one modifies the code in following way the issue gets resolved:

static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
    foreach (TSource element in source)
    {
        if (--count < 0) break;
        yield return element;
    }
}
stop-cran
  • 4,229
  • 2
  • 30
  • 47
  • 2
    Claiming `Take` is at fault for not disposing an object it avoided creating in the first place doesn't make a whole lot of sense. –  Aug 25 '16 at 21:47
  • 1
    Indeed. Nothing should *have* to call GetEnumerator and dispose the result. – Jon Skeet Aug 28 '16 at 10:37