I'm working on a streaming twitter client- after 1-2 days of constant running, I'm getting memory usage of >1.4gigs (32 bit process) and soon after it hits that amount, I'll get an out of memory exception on code that's essentially this (this code will error in <30 seconds on my machine):
while (true)
{
Task.Factory.StartNew(() =>
{
dynamic dyn2 = new ExpandoObject();
//get a ton of text, make the string random
//enough to be be interned, for the most part
dyn2.text = Get500kOfText() + Get500kOfText() + DateTime.Now.ToString() +
DateTime.Now.Millisecond.ToString();
});
}
I've profiled it and it's definitely due to class way down in the DLR (from memory- I don't have my detailed info here) xxRuntimeBinderxx and xxAggregatexx.
This answer from Eric Lippert (microsoft) seems to indicate that I'm making expression parsing objects behind the scenes that don't ever get GC'd even though no reference is kept to anything in my code.
If that's the case, is there some way in the code above to either prevent it or lessen it?
My fallback is to eliminate the dynamic usage, but I'd prefer not to.
Thanks
Update:
12/14/12:
THE ANSWER:
The way to get this particular example to free up its tasks was to yield (Thread.Sleep(0)), which would then allow the GC to handle the freed up tasks. I'm guessing a message/event loop wasn't being allowed to process in this particular case.
In the actual code I was using (TPL Dataflow), I was not calling Complete() on the blocks because they were meant to be a never-ending dataflow- the task would take Twitter messages as long as twitter would send them. In this model, there was never any reason to tell any of the blocks that they were done because they'd never BE done as long as the app was running.
Unfortunately, it doesn't look like Dataflow blocks were never designed to be very long running or handle untold numbers of items because they actually keep a reference to everything that's sent into them. If I'm wrong, please let me know.
So the workaround is to periodically (based on your memory usage- mine was every 100k twitter messages) free the blocks and set them up again.
Under this scheme, my memory consumption never goes over 80megs and after recycling the blocks and forcing GC for good measure, the gen2 heap goes back down to 6megs and everything's fine again.
10/17/12:
- "This isn't doing anything useful": This example is merely to allow you to generate the problem quickly. It's boiled down from a few hundred lines of code that have nothing to do with the issue.
- "An infinite loop creating a task and in turn creates objects": Remember- this merely demonstrates the issue quickly- the actual code is sitting there waiting for more streaming data. Also- looking at the code- all of the objects are created inside the Action<> lambda in the task. Why isn't this being cleaned up (eventually) after it goes out of scope? The issue also isn't due to doing it too quickly- the actual code requires more than a day to arrive at the out of memory exception- this just makes it quick enough to try things out.
- "Are tasks guaranteed to be freed?" An object's an object, isn't it? My understanding is that the scheduler's just using threads in a pool and the lambda that it's executing is going to be thrown away after it's done running regardless.