1

Is there a memory efficient way to use 'using' within a recursive function when e.g. writing lines to a file?

I read C# 'using' block inside a loop and it mentioned that you don't want to put a using statement inside a for loop unless you have to. (makes sense, one doesn't want multiple instances of 'using' if one doesn't need them). So in the case of a for loop if you can put it outside, you do.

But here, I have a recursive function. So the 'using' statement is going to run multiple times even if I put it outside of a for.

So what is a good or proper way of placing the 'using' statement?

I don't know if I should avoid 'using', and declare the StreamWriter object, StreamWriter writetext before the method call and dispose of it after with writetext.Dispose(). Or maybe there is a more conventional way with 'using'. Maybe wrapping the 'main' call DirSearch_basic_writetofile("c:\\aaa"); with a 'try' and putting the Dispose line in a finally. And avoiding 'using' then. That's just a thought.

// requires directory c:\texts
File.Delete(@"c:\texts\filelist.txt");
// list files and subdirectories of c:\aaa and write them to file "c:\texts\filelist.txt"
DirSearch_basic_writetofile("c:\\aaa");

// recursive function that lists files  and directories and subdirectories,in given directory  

static void DirSearch_basic_writetofile(string sDir)
{
    Console.WriteLine("DirSearch(" + sDir + ")");
    Console.WriteLine(sDir+@"\");
    try
    {
        using (StreamWriter writetext = new StreamWriter("c:\\texts\\filelist.txt",true))
        {
            writetext.WriteLine("DirSearch(" + sDir + ")");
            writetext.WriteLine(sDir);

            foreach (string f in Directory.GetFiles(sDir))
            {
                Console.WriteLine(f);
                writetext.WriteLine(f);
            }
        }

        foreach (string d in Directory.GetDirectories(sDir))
        {
            DirSearch_basic_writetofile(d);
        }

    }
    catch (System.Exception excpt)
    {
        Console.WriteLine(excpt.Message);
    }

}
barlop
  • 12,887
  • 8
  • 80
  • 109
  • Pass the streamwriter down the recursive calls. – mjwills May 15 '20 at 02:15
  • Also, consider a different approach. Have one thread populating a `BlockingCollection` with entries of file / directory names. Then have a second thread reading from the `BlockingCollection` (basically a Producer Consumer model). That way you don't even need to pass the `StreamWriter` around (since it is used in only one place). – mjwills May 15 '20 at 02:17
  • @mjwills re "Pass the streamwriter down the recursive calls. " <-- I suppose that'd be one way of a general solution of declaring the Stream outside of the recursive function. e.g. `using(StreamWriter writetext = new StreamWriter("c:\\texts\\filelist.txt", true)) DirSearch_basic_writetofile("c:\\aaa");` (not passing the stream down the recursive calls), but still , declaring it outside of it. – barlop May 15 '20 at 02:17
  • The benefit of passing it (rather than say making it a static variable) is that it may make it easier to, for example, support multiple invocations at once. But yes, if you squint hard enough they are a similar approach. – mjwills May 15 '20 at 02:19
  • 1
    Note the bottleneck in your project is likely to be the `Console.WriteLine` calls. If you wanted to improve performance you should reduce the number of `Console.WriteLine` calls (e.g. call it every 10 or 100 or 1000 entries (e.g. concatenating 100 entries and then writing them all to the console in one hit) rather than every entry). – mjwills May 15 '20 at 02:21
  • @mjwills I guess even then it'd hold up the code 'cos writing 1000 lines would take longer than writing one line. Perhaps the writing to console would have to be in its own thread and run concurrently? – barlop May 15 '20 at 02:29
  • It would hold it up - yes. But massively less so. I would expect a single `Console.WriteLine` call of 1000 lines of text to be 50-500 times faster than 1000 individual calls. – mjwills May 15 '20 at 02:32
  • `Perhaps the writing to console would have to be in its own thread and run concurrently?` Indeed that is an option. Honestly though I'd just batch and call every 1000 or so. – mjwills May 15 '20 at 02:33
  • @mjwills why is console-writing eg 1000 lines once, more efficient than writing 1 line a thousand times? – barlop May 15 '20 at 02:36
  • 1
    "Because it is" (mainly due to locking). Try it and see. https://stackoverflow.com/a/50979627/34092 – mjwills May 15 '20 at 02:37

3 Answers3

1

The linked thing is a case where you were using the same resource for all itterations of the loop. In that case, opening and closing it every itteration serves no purpose. As long as it is closed by the end of all loops, it is save enough.

The opposite case is if you use a different resource every itteration. Say, when going over a list of filenames or full paths, to open each in turn. In that case you got no chocie but to have a new File related instance each itteration.

A recursion is not really different from a loop. You can always replace a loop with a recursion, but the opposite is not always the case. Same rules apply:

  • If it is the same resource, you just have to move the creation of the resource outside the recursive function. Rather then taking a path (or using a hardcoded one), let it take a Stream. That keeps the function nicely generic
  • If you got a different resource, you have no choice but create a new Instance with a new using every recursion. However I can not think of any "recursive using" case.

If you got to itterate over all files in a directory inlcuding all subdirectories, you would have the recursive function recurse over the directories (no unmanaged resource needed). And then a loop inside the recursive function to itterate over the files inside the current directory (wich requires unmanaged resources).

Edit:

static void DirSearch_basic_writetofile(string currentDirectory, StreamWriter Output){
    //do your thing, using output as the stream you write to

    //Do recusirve calls as normal
    DirSearch_basic_writetofile(subDir, Output);
}

calling it:

using (StreamWriter OutputWriter = new StreamWriter("c:\\texts\\filelist.txt",true){
    DirSearch_basic_writetofile(startDirectory, OutputWriter);
}
Christopher
  • 9,634
  • 2
  • 17
  • 31
  • 2
    Actually any recursive code can be rewritten with loops and a Stack. – juharr May 15 '20 at 02:02
  • @juharr Recursion is just calling the same code over and over, using a call stack. So it sounds like you reinvented the call stack. – Christopher May 15 '20 at 02:04
  • 1
    More or less, but that's still a way to translate recursive code into looping code. – juharr May 15 '20 at 02:08
  • Converting a recursive function to a state machine can expose more opportunities for compiler optimisations. When you get down to the bare metal of assembly, all control flow is just GOTO's with extra steps. – Jeremy Lakeman May 15 '20 at 02:12
  • I suppose I could do `using(StreamWriter writetext = new StreamWriter("c:\\texts\\filelist.txt", true)) DirSearch_basic_writetofile("c:\\aaa");` putting the resource outside as you suggest. – barlop May 15 '20 at 02:16
  • @barlop Ah, now I understand: You crawl a directory strucutre recursively, writing the contents to a file. Yes, opening what is effectively a "log file" or "output file" outside the recursion/loop is indeed the way to go. You always write to the same file, so might as well keep it open for the full recusion/loop – Christopher May 15 '20 at 02:25
  • @Christopher Looks like the declaration would have to be outside the 'using' rather than within the 'header' of the using, in order for the method's body(body of DirSearch_basic_writetofile) to see it, but then would the using still work? `StreamWriter writetext; using (writetext = new StreamWriter("c:\\texts\\filelist.txt", true)) DirSearch_basic_writetofile("c:\\aaa");` – barlop May 15 '20 at 02:55
  • @Christopher rather having `static StreamWriter writetext;` in the class and then in the main having `using (writetext = new StreamWriter("c:\\texts\\filelist.txt", true)) DirSearch_basic_writetofile("c:\\aaa"); ` I suppose the 'using' will still work even without the `StreamWriter writetext` declaration in it? – barlop May 15 '20 at 03:12
  • @barlop Why do you want to make it static??? using will reliably dispose of stuff, when you leave the block. Calling a function is **not** leaving the block. You have to get all the way out of the recursive callstack to leave it. Otherwise *every single DbCommand* funciton would Dispose of the connection it needs to work. – Christopher May 15 '20 at 03:20
  • @barlop using is syntax sugar for a try/finally/dipose with null check. Nothing more, nothing less. **Calling a function does not leave a try block.** If it did, it oculd never be catching any exception and would have no literally no sense. Leaving the try with a goto, calls finally. If the try is inside a function, leaving the function via return calls the finally. leaving the try with a Excpetion, calls the finally. If the using is inside a loop and you break out the loop with a continue or the like, finally will be called. – Christopher May 15 '20 at 03:27
  • @Christopher you write "using is syntax sugar for a try/finally/dipose with null check" <-- you can create and dispose of an object without a try/finally but let's say that, as you say, it's essentially a try/finally, then, I am asking if it still works to declare the object before the try? i.e. static StreamWriter sw; declared at class level. I suppose it's fine. 'cos dispose isn't about undeclaring a variable. (Note- I didn't suggest that calling a function would leave a try block!) – barlop May 15 '20 at 03:56
  • @Christopher You write "Why do you want to make it static???" <-- C# forced me to make it static. because if I declare a variable outside of `static void DirSearch_basic_writetofile` then the only way that static method can see the variable is if the variable is static. And even though `static void DirSearch_basic_writetofile` is an 'inner method', it can't see a variable declared in main(..). So the variable to be seen by both methods, has to be class level. And then it has to be static not instance as the method is static – barlop May 15 '20 at 04:00
  • @barlop "you can create and dispose of an object without a try/finally" Sure you can. If you want to make it *unessesarily uncertain if it is actually being disposed off*. But who would ever want to make Disposing unreliable? | You can declare the variable before the try/using. But should only create and assign the instance inside the try/the using. Creation can fail in odd ways, causing a half-open thing that is unuseable but still requires disposing. – Christopher May 15 '20 at 16:18
  • @barlop regarding static: Only partially correct. A static function can use all static variables **and** everything it is given as argument. Wich it can then just pass thru, to it's own recursive calls. It is like sDir, minus the changing value for recursive calls. Making a static variable to share data is a anti-pattern. The only thing that should be static are readonly and const type constants. – Christopher May 15 '20 at 16:21
  • 1
    @barlop It can be better to you only hand a String Builder through (instead of teh string), defering the writing of the file until you finished all loops. But taht would make this "all or nothing". – Christopher May 15 '20 at 16:23
  • @Christopher If you pass an argument to a function(regardless of whether the function is static or instance), then the function is not seeing a static or instance variable. It's seeing a local variable declared in the function header. That local variable is not static or instance. Static or instance is a keyword for things at class level. Also, I want to keep my recursive function to just having the one parameter/argument. – barlop May 15 '20 at 16:28
  • @barlop I really can not figure out the issues you are thinking yourself into. I wrote a simple example on how to do this. Plain, old, boring recursion. Plain, old, boring using. And the only thing static is the function. – Christopher May 15 '20 at 16:28
  • @Christopher " It can be better to you only hand a String Builder through (instead of teh string), defering the writing of the file until you finished all loops" <-- I agree that'd be much quicker. (deferring writing of the file , as well as the console, the console being probably a bigger bottleneck). I guess that could be an alternative parameter to file. (or, dare I say, additional static variable ;-) ) – barlop May 15 '20 at 16:34
  • @Christopher You write "Making a static variable to share data is a anti-pattern. The only thing that should be static are readonly and const type constants" <-- sounds dogmatic. Says Who? I've sen static variables used for shared data across objects for example, and found this link mentioning that https://www2.seas.gwu.edu/~rhyspj/spring07cs143/studio6/studio651.html "We can use a static variable that will be shared across all Dog instances." If a class is very big and has many different things in there then a person might not know that certain variables are used... – barlop May 15 '20 at 17:12
  • @Christopher but in certain cases it can make code look cleaner.. rather than adding parameters to the method or every method. I'd be surprised if everybody agreed with whoever it is that you think says it's an anti-pattern. Often the answer to that dogma is 'not necessarily'. Like "goto considered harmful" was replied to with "goto considered harmful, considered harmful". It is possible to have non-harmful uses. – barlop May 15 '20 at 17:12
  • @barlop "sounds dogmatic. Says Who?" Everyone that says "no global variables". As global and static a almost synonyms: https://www.learncpp.com/cpp-tutorial/why-global-variables-are-evil/ | https://dev.to/mervinsv/why-global-variables-are-bad-4pj | https://en.wikipedia.org/wiki/Global_variable | https://www.tutorialspoint.com/Why-should-we-avoid-using-global-variables-in-C-Cplusplus | https://wiki.c2.com/?GlobalVariablesAreBad – Christopher May 17 '20 at 12:33
  • @barlop "but in certain cases it can make code look cleaner.." There is **no case** where adding a variable that is out of scope and editable by literally every other piece of code, would make the code "cleaner". | There could never be two instances of a static variable. No running two of those searches in paralell using async. | All those are no issues for a function argument. You know exactly where the StreamWriter comes from and what happens to it, unless the caler does heroically bad code (wich I avoided). – Christopher May 17 '20 at 12:39
1

If we want to solve using yield return

You might want to restructure the code such that you separate out the recursive part; for example with a yield return.

Something like this below ( sorry, no IDE at hand, let's see if this works) is a simplistic approach.

If you need to write out the new header ( DirSearch(" + sDir + ") ) every time you switch directory, that's doable by not returning String only from producer an object containing String directoryName, List fileNames, and return only once for each directory.

static void DirSearch_basic_writetofile(string sDir)
{
    Console.WriteLine("DirSearch(" + sDir + ")");
    Console.WriteLine(sDir+@"\");
    IEnumerable<String> producer = DirSearch_Producer(string sDir);
    try
    {
        using (StreamWriter writetext = new StreamWriter("c:\\texts\\filelist.txt",true))
        {
            writetext.WriteLine("DirSearch(" + sDir + ")");
            writetext.WriteLine(sDir);

            foreach (string f in DirSearch_Producer(sDir))
            {
                Console.WriteLine(f);
            }
        }
    }
    catch (System.Exception excpt)
    {
        Console.WriteLine(excpt.Message);
    }
}

public static IEnumerable<String> DirSearch_Producer(string sDir){
  foreach (string f in Directory.GetFiles(sDir))
  {
    yield return f;
  }
  foreach (string d in Directory.GetDirectories(sDir))
  {
    foreach (String f in DirSearch_Producer(d)){
        yield return f;
    }
  }
}

Alternative, without using yield return we can use the Directory.GetFiles with EnumerationOptions to go through subdirectories as well. It makes things much simpler. See: RecurseSubdirectories

Tom S.
  • 56
  • 4
  • Is there perhaps an alternative to `GetFiles` and `GetDirectories` that _already_ does this (i.e. avoids the need for `yield return`)? – mjwills May 15 '20 at 02:34
  • 1
    Right, actually there is :) We can use the Directory.GetFiles with EnumerationOptions to go through subdirectories as well. It makes things much simpler. See: [RecurseSubdirectories](https://learn.microsoft.com/en-us/dotnet/api/system.io.enumerationoptions.recursesubdirectories?view=netcore-3.1#System_IO_EnumerationOptions_RecurseSubdirectories) – Tom S. May 15 '20 at 04:16
  • @TomS. I think you should add this to your answer as an alternative. – ckuri May 15 '20 at 04:31
  • 1
    Also check `EnumerateFiles` rather than `GetFiles`. https://learn.microsoft.com/en-us/dotnet/api/system.io.directory.enumeratefiles?view=netcore-3.1#System_IO_Directory_EnumerateFiles_System_String_System_String_System_IO_EnumerationOptions_ – mjwills May 15 '20 at 05:31
0

I am going to leave Christopher's answer as accepted..

I will just note here the solution I used..

using (writetext = new StreamWriter("c:\\texts\\filelist.txt", true))
   DirSearch_basic_writetofile_quicker("c:\\aaa");

I also used a StringBuilder and wrote to console and file after

        // https://stackoverflow.com/questions/15443739/what-is-the-simplest-way-to-write-the-contents-of-a-stringbuilder-to-a-text-file
        System.IO.File.WriteAllText(@"c:\texts\filelist.txt", sb.ToString());

        Console.Write(sb.ToString());

mjwillis had suggested to me that writing to the console one line at a time was a bottleneck, and christopher mentioned about writing to the file at the end. So I just write to both at the end / after the call.

I used a static variable for StreamWriter and a static variable for StringBuilder, so that main() and the recursive function could see it. I didn't want to create a new parameter for the recursive function, 'cos I wanted to keep the recursive call looking 'simple' (one parameter).

barlop
  • 12,887
  • 8
  • 80
  • 109