I'm working with a memory hungry application that makes uses of Task to do processing in parallel. The problem is that it generates a lot of memory, then hangs onto it, overloading my 16GByte system until the GC runs. At this point, it's performance is awful and can take days to finish. The original application would normally take 30 minutes to run. This is a stripped down version of this:
class Program
{
static void Main(string[] args)
{
var tasks = new List<Task<string[]>>();
var report = new List<string>();
for (int i = 0; i < 2000; i++)
{
tasks.Add(Task<string[]>.Factory.StartNew(DummyProcess.Process));
}
foreach (var task in tasks)
{
report.AddRange(task.Result);
}
Console.WriteLine("Press RETURN...");
Console.ReadLine();
}
}
Here's the 'processor':
public static class DummyProcess
{
public static string[] Process()
{
var result = new List<string>();
for (int i = 1; i < 10000000; i++)
{
result.Add($"This is a dummy string of some length [{i}]");
}
var random = new Random();
var delay = random.Next(100, 300);
Thread.Sleep(delay);
return result.ToArray();
}
}
The problem I believe is here:
foreach (var task in tasks)
{
report.AddRange(task.Result);
}
The tasks don't get disposed when they're done - what's the best way to get the result (string[]) out of the task and then dispose of the task?
I did try this:
foreach (var task in tasks)
{
report.AddRange(task.Result);
task.Dispose();
}
However not much difference. What I might try is simply stopping the results being returned, that way the huge 10 - 50 MBytes of strings aren't retained (in the original application).
EDIT: I tried replacing the code to read the results with the following:
while (tasks.Any())
{
var listCopy = tasks.ToList();
foreach (var task in listCopy)
{
if (task.Wait(0))
{
report.AddRange(task.Result);
tasks.Remove(task);
task.Dispose();
}
}
Thread.Sleep(300);
}
I had to abort after two hours - I'll let it run overnight tonight and see if it finishes. Memory usage seemed better as it ran but still slow.