6

I recently posted an article about a log file reader failing due to out of memory errors > Out of memory error archiving a log file

Before I had a chance to try the simpler methods (calling the log file a name with a date in to prevent archiving anyway) which obviously would have meant rewriting the method etc, I first tried the Garbage Collection option as I had never used it e.g GC.Collect().

This was placed in the first try/catch if a memory error was thrown whilst trying to read the log files contents and it seems to free up half the memory e.g from a debug file used during this process (as the log file is obviously out of action so this is to help me debug after the fact) I get this response from last nights archive process.

Attempt Archive
Take contents of current file with ReadFileString  (this is my custom stream reader method I wrote which you can see in the original article)
Taken contents of current file!
In TRY/CATCH - Out of Memory Exception - Try GC.Collect()
Memory used before collection: **498671500**
Memory used after collection: **250841460**
Try again with ReadFileString
How much content have we got? Content Length is **123595955**
We have content from the old log file

So GC.Collect seems to fix the read file issue.

However I am just wondering WHAT kind of memory it is freeing up as from this debug as it is removing 247.83MB of memory when I call GC.Collect().

Therefore I would like to know what sort of objects or memory it is freeing up as I thought .NET was supposed to have good inbuilt garbage collecting and if it is making this much "freeable" memory over time should I be regularly calling GC.Collect() every so often to free up memory or this amount of memory generated just because it failed at the first attempt to read the log file into memory?

Obviously it hasn't been working with large files for some time now until I tried the GC.Collect which I have never used before so I was just wondering where the memory was coming from, when it gets collected normally, and should it be called elsewhere.

This is a windows service app which nooks into a DLL that makes many HTTP calls to a 3rd party API using JSON over the day using multiple timers to control each job it needs to run. Therefore it runs constantly unless I manually stop the service.

So should I call a GC.Collect() once a night anyway just as good practise as from other articles people say to leave Garbage collection to the system but from this instance it has helped to quickly solve a problem where Out of Memory errors were being thrown (I have a 14GB 64 bit computer that this is running on).

Community
  • 1
  • 1
MonkeyMagix
  • 677
  • 2
  • 10
  • 30
  • 5
    Related: [Best Practice for Forcing Garbage Collection](http://stackoverflow.com/questions/233596/best-practice-for-forcing-garbage-collection-in-c-sharp) – Tim Schmelter Apr 11 '16 at 09:38
  • 2
    It didn't solve your problem, it just postponed it. Just like you could try to port your project to x64, that would also delay the problem. You should avoid loading large files at once into memory, that is the real issue here I suppose. The system couldn't allocate enough *contiguous* memory. – Lucas Trzesniewski Apr 11 '16 at 09:40
  • 1
    In most scenarios calling GC.Collect will make you suffer later, because GC move all survived objects to next generation and thus they will live longer! – Valery Petrov Apr 11 '16 at 10:03
  • 3
    You probably have a bunch of small objects that are not enough to ever force a gen#2 collection. Which have a reference to a huge chunk in the Large Object Heap. The kind of scenario where GC.Collect() can release a quarter of a jiggabyte. If the GC does not behave "naturally" then having to help is not wrong. Do keep in mind that using the LOH for archiving a log file is never necessary, it works just as well when you copy one line at a time instead of using something like File.ReadAllText(). Minus this failure mode. – Hans Passant Apr 11 '16 at 10:26
  • 2
    Your memory problem is just a symptom of your actual problem: reading an entire file into memory. For the specific issue that you're trying to solve, consider just _renaming_ (using File.Move) the file (or _copying_ the file using File.Copy). With this you'll completely avoid the memory issue. – iggymoran Apr 11 '16 at 10:50
  • I expect you are not closing (disposing) a file object and the GC is doing so. You need to use a memory profiler if you wish to understand what is going on, rather the just guess. – Ian Ringrose Apr 11 '16 at 12:42
  • Is File.Move not just a FIle.Copy and File.Delete combined into one like other languages? Also if you look at the original article both the 2 methods were failing from my custom using (StreamReader reader = new StreamReader(path)) { return reader.ReadToEnd(); } to File.ReadAllText() which was why I reverted to the GetLastThousandLines method which works. I will look at the other solutions. Remember 1st - custom using method, then ReadAllText(), then GetLastThousandLines with a queue as article shows which works with or without the memory collection but I need whole file. – MonkeyMagix Apr 12 '16 at 10:00
  • 1
    => "Is File.Move not just a FIle.Copy and File.Delete combined", NO, firstly the OS does the work, so can do a better job then any .net code can. Then if the source and target folders are on the same file system, no data gets copied, only a pointer get updated. – Ian Ringrose Apr 12 '16 at 10:05
  • So you suggest a FileInfo reference and a move to solve the issue then. – MonkeyMagix Apr 12 '16 at 10:10
  • The **only time** I have ever found it useful to call `GC.Collect()` in the last 14 years was when I was trying to profile the performance of code and wanted to create a histogram of GCs that occur during my code run against the average elapsed time. – Enigmativity Apr 12 '16 at 10:12
  • I have rewritten the method to use a File.MoveTo and then File.CopyTo if that fails with the 100k working fall back if any of those two fail for any reason (bad code for this 1st rewrite for example), once I see those two methods working I will remove the fallback to getting 100k lines which DOESNT throw the out of memory error as a few people have suggested by the way. Its the using(streamreader) and File.ReadAllText methods throwing it. – MonkeyMagix Apr 12 '16 at 12:15
  • Just so you know the rewritten version with CopyTo seems to be working fine so I have removed the GC.Collect from the method – MonkeyMagix Apr 15 '16 at 09:47

5 Answers5

6

Firstly check very carefully you are closing (disposing) all files object, as otherwise internal file buffers will not be released until the GC discovers you have forgotten to close the file.

If you don’t need to copy the file, just rename it (FileInfo.Rename). This is the normal way of coping with log files.

If you don’t need to process the data, use the FileInfo.CopyTo or CopyTo(Stream) method, that why the text will be copied using a sensible small buffer and memory will never need to be allocated to hold all the text at the same time.

If you do need to process the text, read one line at a time, this will result in lots of small strings being created, rather than one very large string. The .net GC is very good at reclaiming small short lived object. A no point should you have the entire log file in memory at the same time. Creating a custom iterator that returns the lines in the file, would be one way of doing it.

Ian Ringrose
  • 51,220
  • 55
  • 213
  • 317
  • Can I ask why the File.ReadAllText method works after the garbage collector. How does that method work then. Before i put in GC.Collect() both my using statement and the File.ReadAllText methods were BOTH failing now when the first one throws an out of memory error I call GC.Collect() then ReadAllText() and it works fine. Take the GC.Collect() out and the ReadAllText fails with out of memory. So how does ReadAllText work. Also is a move not just a copy and delete combined under the scenes as it is in other languages? Someone said to rename the file and that does a Move. So CopyTo is best yes? – MonkeyMagix Apr 12 '16 at 09:51
  • @MonkeyMagix When a file is stored on disk, it is not stored in a "folder", the folder just contains a list of file name and the address each file is stored at. Therefore a file can be renamed and/or moved to a different folder, *without* touch any of the data in the file. – Ian Ringrose Apr 12 '16 at 09:55
  • So most moves don't need to copy any data. hence do a rename or move (use FileInfo class) if you don't have a reason to "process" the data in your log file. – Ian Ringrose Apr 12 '16 at 09:59
  • @MonkeyMagix, as do why the GC makes your software "work", see Han's comment to your question, or you may not be disposing an object you should. – Ian Ringrose Apr 12 '16 at 10:02
  • Ok will look at the move. Was wondering why my using statement with the stream reader fails with out of memory but now when I call GC.Collect() the File.ReadAllText isn't failing. I would have thought File.ReadAllText would have failed as it's reading in all the text of the large file. That is why I put them in the order I did - Using (stream reader), File.readalltext(). then backup 100,000 lines with the queue statmeent – MonkeyMagix Apr 12 '16 at 10:09
  • 1
    @IanRingrose - It's probably worth noting that the GC never disposes objects directly. The GC **may** call a finalizer that does, but knowing which objects have finalizers that call `.Dispose()` is hard to know. We should never rely on the GC effecting a `.Dispose()` call. – Enigmativity Apr 12 '16 at 10:10
  • @MonkeyMagix please insert your code. – mehrdad safa Apr 12 '16 at 11:16
  • The code is in the link I put in the article > http://stackoverflow.com/questions/36505210/out-of-memory-error-archiving-a-log-file >> and the queue is the WORKING part - it is a fall back when the Streamreader and File.ReadAllText() fail I fall back to getting the last 100k lines so dont think that its the queue that is failing as its NOT. Its the ReadAllText and my using (streamreader) methods thorwing the memory errors NOT the example with the queue – MonkeyMagix Apr 12 '16 at 12:13
2

We can really only guess. Since you have enough free memory (and non-contiguous virtual address space), the problem is most likely related to not being able to allocate enough contiguous memory. The things that need the most contiguous memory are almost exlusively arrays, like the backing array for your queue. When everything works correctly, address space is compacted regularly (part of the GC) and you maximize available contiguous memory. If this doesn't work, something is preventing compaction from working correctly - for example, pinned handles, like the ones used for I/O.

Why does an explicit GC.Collect() kind of help? It might very well be that you're in a point where all those pinned handles are released, and the compaction actually works. Try using something like VMMap or CLRProfiler to see how the objects are laid out in the address space - the typical case of compaction issues is when you have something like 99% free space in your memory, but nowhere enough to allocate a new object (strings and arrays don't work well with memory fragmentation). Another typical case is when you neglect to use GC.AddMemoryPressure when allocating unmanaged memory (e.g. for the buffers), so the GC has no idea that it should really start collecting already. Again, CLRProfiler is very helpful in watching when GC happens, and how it maps to memory usage.

If memory fragmentation is indeed the problem, you need to figure out why. This is actually somewhat complex, and may require some work with something like WinDbg, which is rather hard to use, to say the least. I/O always means some pinned buffers, so if you're doing a lot of I/O in parallel, you're interfering with the proper functioning of the GC. The GC tries to deal with this by creating multiple heaps (depending on the exact configuration of GC you're running, but looking at your case, server GC should really be what you're using - you are running this on Windows Server, right?), and I've seen hundreds of heaps being created to "fix" the fragmentation issue - but ultimately, this is destined to fail.

If you have to use pinned handles, you really want to allocate them once, and reuse them if possible. Pinning is preventing the GC from doing its job, so you should only pin stuff that doesn't need to be moved in memory (large object heap objects, pre-allocated buffers on the bottom of the heap...), or at least pin for as short a time as possible.

In general, reusing buffers is a good idea. Sadly, that means that you want to avoid strings and similar constructs in code like this - strings being immutable means that every single line you read needs to be a separately allocated object. Fortunately, you don't necessarily need to deal with strings in your case - a simple byte[] buffer will work just as well - just look for 0x13, 0x10 instead of "\r\n". The main problem you have is that you need to hold a lot of data in memory at once - you either need to minimize that, or make sure the buffers are allocated where they're used the best; for file data, a LOH buffer will help quite a bit.

One way to avoid so many allocations would be to parse the file looking for end-of-lines and remembering just the offset of the line where you want to start copying. As you go, line-by-line (using the reusable byte[] buffer), you'll just update the offset of the "at most 100 000th line from the end" rather than allocating and freeing strings. Of course, this does mean you have to read some of the data twice - that's just the price of dealing with data that isn't fixed-length and/or indexed :)

Another approach is to read the file from the end. How well this works is hard to predict, since it depends a lot on how the OS and filesystem are capable of handling backwards reading. In some cases, it's just as good as forward reading - both are sequential reads, it's just about whether the OS/FS is smart enough to figure that out or not. In some cases, it's going to be very expensive - if that's the case, use large file buffers (e.g. 16 MiB instead of the more customary 4 kiB etc.) to squeeze as sequential reads as possible. Counting from the back still doesn't quite allow you to stream the data directly to another file (you'll either need to combine this with the first approach or keep the whole 100 000 lines in memory at once yet again), but it means you only ever read the data you're going to use (the most you over-read is the size of your buffer).

Finally, if all else fails, you could use unmanaged memory for some of the work you're doing. I hope I don't have to say this is much trickier than using managed memory - you'll have to be very careful about proper addressing and bounds-checking, among other things. For a task like yours, it's still quite manageable - ultimately, you're just moving lots of bytes with very little "work". You better understand the unmanaged world well, though - otherwise it's just going to lead to bugs that are very hard to track and fix.

EDIT:

Since you made it clear that the "last 100k items" is a workaround and not the desired solution, the easiest thing is to simply stream the data rather than reading everything to RAM and writing everything in one go. If File.Copy/File.Move aren't good enough for you, you can use something like this:

var buffer = new byte[4096];
using (var sourceFile = File.OpenRead(...))
using (var targetFile = File.Create(...))
{
  var bytesRead = sourceFile.Read(buffer, 0, buffer.Length);
  if (bytesRead == 0) break;

  targetFile.Write(buffer, 0, bytesRead);
}

The only memory you need is for the (relatively small) buffers.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • As I said in an earlier comment I have rewritten it to use the FileInfo CopyTo method with a fall back to the 100k if for example I have written it badly tonight. If it works I will remove that fallback. Also some people keep saying the queue is causing the memory error it is NOT, that works. it is the using(streamreader) OR File.ReadAllText methods that throw it. At the moment the streamreader throws it, I call GC.Collect then File.ReadAllText works. Before both failed with out of memory errors – MonkeyMagix Apr 12 '16 at 12:18
  • 1
    Never thought of the timings of when io objects are pinned making an explicit call to GC.Collect worthwhile. However I am 99% convinced that the OP can solve his/her problem without storing much data in ram, so avoiding the issue of GC completely. – Ian Ringrose Apr 12 '16 at 12:33
  • @MonkeyMagix Of course it's the `File.ReadAllText` - that's what requests so much contiguous memory (see my answer). But that doesn't mean that that's the real cause - the real cause is (most likely) fragmented address space. Otherwise the GC would simply make sure there *is* enough memory by compacting the heaps and doing a collection. In fact, you just created the worst possible scenario - you allocated pinned buffers *right on top of the heap* just before asking for a huge string in memory. The heap can *never* be compacted to make enough space, even partially. – Luaan Apr 12 '16 at 12:58
  • the File.ReadAllText is the 2nd attempt in the try/catch. The 1st attempt if you read the article is my using(streamreader) function that throws the out of memory error, I guess that does the same thing then as it consumes the whole file in one go (even though someone above used ReadAllText to read in 12GB with no error?) >>> using (StreamReader reader = new StreamReader(path)) { return reader.ReadToEnd(); } – MonkeyMagix Apr 12 '16 at 15:02
  • Also as I say in another comment when I call GC.Collect() After the 1st call to that using(Streamreader) method - the File.ReadAllText reads in the file perfectly after clearing 20GB of memory. I am running this on a 16GB 64bit PC with 8GB free. The file size is less than 1GB. I am trying the copy/move methods tonight to see but I don't see why they wouldn't do the same as the OS does when you copy a large file e.g the window pops up as the transfer of the data is carried out for X seconds/mins. – MonkeyMagix Apr 12 '16 at 15:05
  • @MonkeyMagix Yes, exactly - the two are equivalent. In fact, if you look at the code for `File.ReadAllText`, it uses the exact same approach (see http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,c193e57831aa94a9). And as I already mentioned, the problem is (most likely) contiguous address space - are you sure the .NET app is running as 64-bit as well? 1 GiB of ASCII text means 2 GiB of `string` in memory, and that's half the whole address space of a 32-bit process. The copy methods use the OS, not the shell - I think you're mixing the two up. The copy dialog is in the shell. – Luaan Apr 12 '16 at 18:59
1

A reasonable workaround to calling GC.collect would be to create a new MemoryFailPoint before the critical code section.

This of course does not solve the the real problem, why the GC in your case did not collect the memory by itself.

In your case you know how much memory you will need (the file size), so by creating a new MemoryFailPoint with that size, you can be reasonably certain the memory will be available. MemoryFailPoint actually calls GC.Collect itself, if it decides it is neccessary, but it also has some additional logic deal with other issues such as page file size or address space fragmentation.

And if the memory is not enough, you avoid an OutOfMemoryException with its potential corrupting side effects, and instead get an InsufficientMemoryException, which can be caught without worries.

HugoRune
  • 13,157
  • 7
  • 69
  • 144
-1

From the MSDN Page for OutOfMemoryException here is the major cause of an OutOfMemoryException:

The common language runtime cannot allocate enough contiguous memory to successfully perform an operation. This exception can be thrown by any property assignment or method call that requires a memory allocation. For more information on the cause of the OutOfMemoryException exception, see "Out of Memory" Does Not Refer to Physical Memory.

This type of OutOfMemoryException exception represents a catastrophic failure. If you choose to handle the exception, you should include a catch block that calls the Environment.FailFast method to terminate your app and add an entry to the system event log...

The key point is that it represents a catastrophic failure and you should exit your app.

Calling GC.Collect() is no longer an option once this type of exception occurs.

Community
  • 1
  • 1
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • I know it says that but it seems to clear 20+ MB of memory then allow the ReadAllText() method to read in the long log file when it didn't previously. So I trapped it in the System.Outofmemory exception and run it there. But anyway I've changed the process now. – MonkeyMagix Apr 12 '16 at 13:27
-7

The garbage collector usually collects managed objects which are not used and referenced anymore automatically. Usually there is no need to (or should not) call GC.Collect() method manually. But for example (in this case) when you are calling:

 queue.Dequeue(item)... 

In a long loop, there is no pointer or variable pointing to removed object, but because it is still within method scope the garbage collector does not collect it until memory becomes very low. You can call it manually if you are in critical situation like this.

mehrdad safa
  • 1,081
  • 9
  • 10
  • 6
    Scoping has no relevance. A local is eligible for collection as soon as it's not read from in the future, regardless of scope. You have to write really convoluted code to prevent this from working well. Scope only matters to the compiler and the debugger - when running in a debugger, .NET will respect local scoping to aid in debugging. After all, if `GC.Collect` can collect those objects, they must already be unreachable. – Luaan Apr 11 '16 at 11:23
  • 1
    Total rubbish! Luann is correct. – Ian Ringrose Apr 11 '16 at 12:37
  • By the way the queue.Dequeue method is not the problem. That is my current backup solution that WORKS to get me the last 100,000 lines when the first two methods File.ReadAllText() and the custom ReadFileString (that uses the using() statement and a stream reader) both fail with out of memory errors. So those two methods are failing and then I use the queue method to get me SOME of the log file. Now when I call GC.Collect() after the stream reader method fails the File.ReadAllText() works. So why isn't the File.ReadAllText() falling over compared to the stream reader in the using? – MonkeyMagix Apr 12 '16 at 10:06
  • @IanRingrose what if op was running in Debug? – Mafii Apr 12 '16 at 10:06
  • @Mafii, I can see how it will make a difference with his code. Also when running in debug even a force GC will not reclaim an object until it is not visible to the debugger. – Ian Ringrose Apr 12 '16 at 10:11
  • True @IanRingrose, ty – Mafii Apr 12 '16 at 10:12
  • @Luaan why????????? what happens when local variable that references an allocated heap memory goes out of scope? the reference variable pops out from stack, so one reference goes out and so helps some objects become unreachable!! ?? – mehrdad safa Apr 12 '16 at 10:47
  • did you seen his code?? does just copy new file or renaming resolve question? maybe true but what about memory over flow? you are clearing question!!! – mehrdad safa Apr 12 '16 at 10:51
  • @Luaan and other expert guys! please see his code in main post and resolve the problem. please dont say copy new file. please answer my question. why out of memory throws? I tried his code with a log file with 12 GB size! no exception! I Got 500000 record instead of 100000? – mehrdad safa Apr 12 '16 at 10:57
  • http://stackoverflow.com/questions/36505210/out-of-memory-error-archiving-a-log-file/36506437?noredirect=1#comment60695453_36506437 – mehrdad safa Apr 12 '16 at 10:58
  • IL doesn't care about the scopes you have in your C# code - the highest granularity you have is method scope. Assembly doesn't care about the scopes you have in IL. If scoping was the issue, your `GC.Collect()` would *not be able to collect the memory*. Since it is, it's obvious that the local is already unreachable, regardless of scoping. – Luaan Apr 12 '16 at 11:33
  • @Luaan you are right. but when a block ends local reference variables will pop out from stack. this helps objects be unreachable and be eligible to collect by GC. Is not true? so there is indirect relevance between scoping and garbage colletion. – mehrdad safa Apr 12 '16 at 11:56
  • No, there really isn't any remainder of the block scoping in the generated IL code, it only exists in C#. In IL, all locals are method-scoped (though possibly reusable). "Pop out from stack" only really happens when leaving method scope, but even that is a gross oversimplification - your locals are often going to be unreachable long before that. Again, unless you're in a debugger where the rules are changed to aid in debugging - if you want to see the actual code, run the application first and *then* attach the debugger; then you can inspect the assembly and see what actually happens. – Luaan Apr 12 '16 at 12:03
  • @Luaan what about out of memory exception? any idea? – mehrdad safa Apr 12 '16 at 12:05
  • Please see the presentation I link to from [this answer](http://stackoverflow.com/a/21571499/15498) to understand why what you're saying about scopes is incorrect. – Damien_The_Unbeliever Apr 12 '16 at 12:08