2

I have twenty 10MB txt files and would like to read these files.

I have read this, which says File.ReadLines() doesn't load entire lines so it doesn't spend much memory.

var directory = "path/to/files";

foreach (var filepath in Directory.EnumerateFiles(directory, "*.txt"))
{
    foreach (var line in File.ReadLines(filepath))
    {
        // I will do something here, but not yet.
    }
}

But when I run this code, Jetbrains Rider DPA says like below image.

enter image description here

enter image description here

How can I fix this problem?

Environment: .NET Core Console Application (6.0)

Edit

Same problem with StreamReader.

enter image description here

Ellisein
  • 878
  • 6
  • 17
  • 4
    How much RAM does your machine have? .NET will gladly consume gigabytes of RAM and never release it back to the OS unless the OS asks it nicely via a memory-pressure signal. Also, you're running a `DEBUG` build, right? So what happens if you profile a Release build? – Dai Jun 08 '22 at 01:29
  • Do you get the same allocation if you use a StreamReader? – gunr2171 Jun 08 '22 at 01:32
  • What encoding are those 10MB `.txt` files using? If it's ASCII or UTF-8, then be aware that (absent any memory-pressure or GC events) your program may consume up to `10 * 20 * 2 == 400 MiB` of memory for all of the enumerated `string` values because internally .NET uses UTF-16, so (for example) loading a 128KB UTF-8 file into memory will use 256KB of actual memory. – Dai Jun 08 '22 at 01:33
  • @Dai I ran my code with Debug build. But Rider didn't notice me when using Release build. Is there any difference between Debug and Release when `File.ReadLines()` is executed? – Ellisein Jun 08 '22 at 01:38
  • 3
    @Ellisein "Is there any difference between Debug and Release when File.ReadLines() is executed?" - **yes** (well, not `ReadLines` specifically, but the GC itself behaves differently): https://stackoverflow.com/questions/7165353/does-garbage-collection-run-during-debug and https://stackoverflow.com/questions/37462378/why-c-sharp-garbage-collection-behavior-differs-for-release-and-debug-executable – Dai Jun 08 '22 at 01:40
  • @gunr2171 It seems that there is no same problem with `StreamReader`, But I wonder why `File.ReadLines()` differs. – Ellisein Jun 08 '22 at 01:44
  • 1
    Because `ReadLines` doesn't Buffer like a Stream can. Think about it, its loading strings into the Managed Heap and they are still kept in memory (for debugging especially). – Jeremy Thompson Jun 08 '22 at 01:47
  • It's interesting, since `ReadLines` essentially uses `StreamReader` (see [here](https://github.com/microsoft/referencesource/blob/5697c29004a34d80acdaf5742d7e699022c64ecd/mscorlib/system/io/ReadLinesIterator.cs)). Is it possible that in your test of `StreamReader`, you're not running it in a way that Rider understands it to be the memory usage of one method? – ProgrammingLlama Jun 08 '22 at 01:47
  • OMG, that was my mistake. Same problem occured with `StreamReader`. – Ellisein Jun 08 '22 at 01:51
  • 1
    In my tests with .NET 6, and BenchmarkDotNet with a MemoryDiagnoser, Running a 10MB file (each line is 500 characters) through both methods only allocated 34MB in both cases. – ProgrammingLlama Jun 08 '22 at 02:03
  • @DiplomacyNotWar I used 20 files for test, can you please do same with more files? – Ellisein Jun 08 '22 at 02:14
  • 4
    I'm currently running code to generate a load of files with unique strings. I'll get back to you with the benchmark result :) – ProgrammingLlama Jun 08 '22 at 02:21
  • 1
    Current result for 20 10MB files is [this](https://i.stack.imgur.com/6xw4m.png), based on [these tests](https://pastebin.com/cSuzEG4i). I'm currently running a bigger set of 10 GB's worth of data. – ProgrammingLlama Jun 08 '22 at 03:13
  • @DiplomacyNotWar It seems that allocated memory increased by the number of files. Thanks for the test :) – Ellisein Jun 08 '22 at 03:22
  • 1
    Testing with 10 GB worth of files, it seems that the total allocation hit 33 GB, but in the 3 GC generations, we only had about 2.8 GB, so stuff is getting garbage collected correctly. [Results](https://i.stack.imgur.com/T42ms.png) – ProgrammingLlama Jun 08 '22 at 05:18

0 Answers0