I'm trying to find out why my application gets slower over time. The application bundles the work to do into batches which can be executed concurrently. Each batch has to wait for the previous one to finish. The batches are executed every ~30ms inside the update function. In code, it would look something like this:
List<List<Action>> batches = new List<List<Action>>(); // class member
foreach (var batch in batches) // inside of the update function
{
Parallel.ForEach(batch, action => action());
}
After a few days, I found that executing all batches takes longer and longer. At first, the execution takes about 15ms, and then after some days it takes more than 100ms. My goal is to find out why execution time keeps rising.
While debugging, I found that the amount of batches and the amount of action in each batch stays the same.
Profiling with dotTrace shows that sometimes one action in a batch takes longer and Parallel.ForEach
will (and should) wait until all previous actions finish. When profiling right after starting the application, a delay is only visible every few hundred updates. After some days, a delay happens for almost every update.
This is how it looks in the profiler; I've marked the individual update cycles in different shades of green and numbered them. As you can see, there is an obvious delay (yellow) in the second update.
During the Yellow period, no other thread is doing work (at least according to the profiler). When selecting the threads one by one, all threads except one are not working on an action. As you can see in the next picture, the one thread that is still working on an action doesn't show any CPU load and all time is spent in ntdll.dll
.
In this case, the call to GetEnumerator
takes about 24ms, but I have seen freezes in functions like ToString()
or ToArray()
too.
Related stuff I have observed:
- The freeze is not tied to a specific function and can happen in any part of the application
- The time spent is always displayed in
ntdll.dll
- Usually happens right after a GC
- All GCs running in the profiled time are gen0
- The called function allocates memory