I've run into an issue that is pretty narrow in scope and would likely be difficult to reproduce in a bare minimum example and I have a workaround in place, but it involves GC.Collect()
I am using TPL Dataflow and have a processing workflow that is running in parallel, but these parallel tasks are using a shared data cache in the form of a ConcurrentDictionary
. I have a max size for this cache, and when it gets too large, I am removing items from the cache. The items are very large data wise (it's basically pixel data from some large images used in the processing) so having them sit around too long can be detrimental. Since it's processing in parallel it can pretty aggressively add and remove items from the cache as it's working through the data, and what I have found is that there are scenarios where items are removed from the cache, but they aren't garbage collected until I feel like it is too late. At least once I've taken a memory snapshot, and in the heap there was 2gb of data, but if I check the box "show dead objects" it shows the heap is using up 10gb of memory, so if I'm interpreting things correctly there is about 8gb of memory that is "dead" and just waiting for garbage collection. Before moving on here I'm sure there will be some people with the idea that if the memory is available why not sit around and more efficiently wait to remove the dead objects, but that's why this was a problem in the first place. All available memory is being used which is nearly locking up the computer and the application is switching over to caching items on the C drive which is slowing things down considerably when this happens and sometimes it's unstable enough that things crash, so I'm curious why the garbage collector is waiting so long to collect in a scenario like this.
I tried changing the removal logic to try to help with the thrashing of the cache and removing chunks at a time instead of always adding one and then removing one, and that maybe helped, but there would still be cases where the memory would appear to run away. What seems to fix it better is putting GC.Collect()
at the end of the removal from the cache, but is that a reasonable fix? Is there a more proper way of handling this? Below is a stripped down and genericized version of my removal logic.
//entering a lock so that only one thread tries to remove the old entries and calls GC.Collect()
if (_cache.Count > 30 && Monitor.TryEnter(_deleteLock))
{
var oldestEntries = _cache.OrderBy(item => item.Value.order).Select(item => item.Key).Take(5).ToList();
foreach(var entry in oldestEntries)
{
_cache.TryRemove(entry, out _);
}
GC.Collect();
Monitor.Exit(_deleteLock);
}