How to avoid running out of RAM, during a concurrent data proccessing?

Question

I have an issue with data concurrent processing. My PC is running out of RAM quickly. Any advices on how to fix my concurrent implementation?

Common class:

public class CalculationResult
{
    public int Count { get; set; }
    public decimal[] RunningTotals { get; set; }

    public CalculationResult(decimal[] profits)
    {
        this.Count = 1;
        this.RunningTotals = new decimal[12];
        profits.CopyTo(this.RunningTotals, 0);
    }

    public void Update(decimal[] newData)
    {
        this.Count++;

        // summ arrays
        for (int i = 0; i < 12; i++)
            this.RunningTotals[i] = this.RunningTotals[i] + newData[i];
    }

    public void Update(CalculationResult otherResult)
    {
        this.Count += otherResult.Count;

        // summ arrays
        for (int i = 0; i < 12; i++)
            this.RunningTotals[i] = this.RunningTotals[i] + otherResult.RunningTotals[i];
    }
}

Single-core implementation of the code is following:

Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
foreach (var i in itterations)
{
    // do the processing
    // ..
    string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing

    if (combinations.ContainsKey(combination))
        combinations[combination].Update(newData);
    else
        combinations.Add(combination, new CalculationResult(newData));
}

Multi-core implementation:

ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();
Parallel.ForEach(itterations, (i, state) => 
{
    Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
    // do the processing
    // ..
    // add combination to combinations -> same logic as in single core implementation
    results.Add(combinations);
});
Dictionary<string, CalculationResult> combinationsReal = new Dictionary<string, CalculationResult>();
foreach (var item in results)
{
    foreach (var pair in item)
    {
        if (combinationsReal.ContainsKey(pair.Key))
            combinationsReal[pair.Key].Update(pair.Value);
        else
            combinationsReal.Add(pair.Key, pair.Value);
    }
}

The issue I am having is that almost each combinations dictionary ends up with 930k records in it, which is on average consumes 400 [MB] RAM memory.

Now, in single core implementation there is only one such dictionary. All checks are performed against one dictionary. But this is slow approach and I want to use multi-core optimizations.

In multi-core implementation there is a ConcurrentBag instance created which holds all combinations dictionaries. As soon as the multi-thread job is finished - all dictionaries are aggregated into one. This approach works well for small amount of concurrent iterations. For example, for 4 iterations my RAM usage was ~ 1.5 [GB]. The issue arises, when I set the full amount of parallel iterations, which is 200! No amount of PC RAM is enough to hold all dictionaries, with million records each!

I was thinking about using ConcurrentDictioanary, until I found out that the "TryAdd" method does not guarantee integrity of added data in my situation, as I also need to run updates on running totals.

The only real multi-threaded option is, instead of adding all combinations to dictionary - is to save them to some DB. Data aggregation will then be a matter of 1 SQL select statement with a group by clause... but I don't like the idea of creating a temporary table and running DB instance just for that..

Is there a work around on how to processes data concurrently and not run out of RAM?

EDIT: Maybe the real question should have been - how to make updating of RunningTotals thread-safe when using ConcurrentDictionary? I have just ran across this thread, with a similar issue with ConcurrentDictionary, but my situation seems to be more complicated as I have an array that needs to be updated. I am still investigating this matter.

EDIT2: Here is a working solution with ConcurrentDictionary. All I needed to do is to add a lock for the dictionary key.

ConcurrentDictionary<string, CalculationResult> combinations = new ConcurrentDictionary<string, CalculationResult>();
Parallel.ForEach(itterations, (i, state) => 
{
    // do the processing
    // ..
    string combination = "1,2,3,4,42345,52,523"; // this is determined during the processing

    if (combinations.ContainsKey(combination)) {
        lock(combinations[combination])
            combinations[combination].Update(newData);
    }
    else
        combinations.TryAdd(combination, new CalculationResult(newData));
});

Single-thread code execution time is 1m 48s, whereas this solution execution time is 1m 7s for 4 iterations (37% performance increase). I am still wondering if SQL approach will be any faster, with millions of records? I will test it out possibly tomorrow and update.

Edit 3: For those of you wondering what's wrong with ConcurrentDictionary updates on a value - run this code with and without the lock.

public class Result
{
    public int Count { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Start");

        List<int> keys = new List<int>();
        for (int i = 0; i < 100; i++)
            keys.Add(i);

        ConcurrentDictionary<int, Result> dict = new ConcurrentDictionary<int, Result>();
        Parallel.For(0, 8, i =>
        {
            foreach(var key in keys)
            {
                if (dict.ContainsKey(key))
                {
                    //lock (dict[key]) // uncomment this
                        dict[key].Count++;
                }
                else
                    dict.TryAdd(key, new Result());
            }
        });

        // any output here is incorrect behavior. best result = no lines
        foreach (var item in dict)
            if (item.Value.Count != 7) { Console.WriteLine($"{item.Key}; {item.Value.Count}"); }

        Console.WriteLine($"Finish");
        Console.ReadKey();
    }
}

Edit 4: After trials and errors I couldn't optimize SQL approach. This turned out to be the worst idea :) I have used an SQL Lite database. In-memory and in-file. With transaction and reusable SQL command parameters. Due to the huge amount of records that needed to be inserted - the performance is lacking. Data aggregation is the easiest part, but it takes a huge amount of time just to insert 4 millions of rows, I can't even begin to imagine how the 240 million of data could be processed efficiently.. So far (and also strangely), ConcurrentBag approach seems to be the fastest on my PC. Followed by a ConcurrentDictionary approach. ConcurrentBag is a bit heavier on memory, though. Thanks to the work of @Alisson - it is now perfectly fine to use it for larger set of iterations!

@Alisson Yes! Sorry, I will update the thread in a sec not to raise any confusions along with 1 working solution — Alex, Dec 23 '17 at 02:30
Not sure why it is necessary to add a `lock` with `ConcurrentDictionary` — , Dec 23 '17 at 02:40
@MickyD because when the same `ConcurrentDictionary` key value is updated from 2+ different threads at the same time, then the update will be done only once, unless we lock the key. Check out the link in the first edit. — Alex, Dec 23 '17 at 02:48
memory usage issue is only actual when ConcurrentBag is used to store 4+ dictionaries with 1 million entries each — Alex, Dec 23 '17 at 03:38
**Fact #1**: This is NOT a `[PARALLEL]`-processing ( even if any professor or a wannabe-"nerd" tried to tell you -- may enjoy and benefit from reading more on a difference between a "just"-`[CONCURRENT]` and a true-`[PARALLEL]` system / process scheduling ). Using any syntax-only keyword ( `foreach`, `Parallel.For` or even #pragma-decorated syntax block ) has never been able to circumvent real obstacles for running any such code in a true-`[PARALLEL]` fashion once physical constraints actually prevent from doing so. — user3666197, Dec 23 '17 at 08:23
**Fact #2**: The DESIGN, not a syntax, is the key for proper performance engineering. **Would anyone spend billions of dollars to build a trans-continental 1000-lane wide highway** for allowing a set of 1000-cars to ride in a true-`[PARALLEL]` fashion, **if every bridge** down to road, across any river or valley, **is known to have just 4-lanes?** Yes, this is exactly matching the sense of using a syntax for building immense co-existing data-structures in many thousands of self-replicas ( though PC systems enjoy [PB] contiguously addressable RAM ), once only as few as ~4-threads can access it. — user3666197, Dec 23 '17 at 08:35

Alisson Reinaldo Silva · Accepted Answer · 2017-12-23T20:15:02.910

So, you just need to be sure you'll have no more than 4 concurrent iterations, that's the limit of your computer resources and by using only this computer, there is no magic.

I created a class to control the concurrent execution and the number of concurrent tasks it will perform.

The class will hold these properties:

public class ConcurrentCalculationProcessor
{
    private const int MAX_CONCURRENT_TASKS = 4;
    private readonly IEnumerable<int> _codes;
    private readonly List<Task<Dictionary<string, CalculationResult>>> _tasks;
    private readonly Dictionary<string, CalculationResult> _combinationsReal;

    public ConcurrentCalculationProcessor(IEnumerable<int> codes)
    {
        this._codes = codes;
        this._tasks = new List<Task<Dictionary<string, CalculationResult>>>();
        this._combinationsReal = new Dictionary<string, CalculationResult>();
    }
}

I made the number of concurrent tasks a const, but it could be a parameter in the constructor.

I created a method to handle the processing. For test purposes, I simulated a loop through 900k itens, adding them to a dictionary, and finally returning them:

private async Task<Dictionary<string, CalculationResult>> ProcessCombinations()
{
    Dictionary<string, CalculationResult> combinations = new Dictionary<string, CalculationResult>();
    // do the processing
    // here we should do something that worth using concurrency
    // like querying databases, consuming APIs/WebServices, and other I/O stuff
    for (int i = 0; i < 950000; i++)
        combinations[i.ToString()] = new CalculationResult(new decimal[] { 1, 10, 15 });
    return await Task.FromResult(combinations);
}

The main method will start tasks in parallel, adding them to a list of tasks, so we can keep track of them lately.

Everytime the list reaches the maximum concurrent tasks, we await a method called ProcessRealCombinations.

public async Task<Dictionary<string, CalculationResult>> Execute()
{
    ConcurrentBag<Dictionary<string, CalculationResult>> results = new ConcurrentBag<Dictionary<string, CalculationResult>>();

    for (int i = 0; i < this._codes.Count(); i++)
    {
        // start the task imediately
        var task = ProcessCombinations();
        this._tasks.Add(task);
        if (this._tasks.Count() >= MAX_CONCURRENT_TASKS)
        {
            // if we have more than MAX_CONCURRENT_TASKS in progress, we start processing some of them
            // this will await any of the current tasks to complete, them process it (and any other task which may have been completed as well)...
            await ProcessCompletedTasks().ConfigureAwait(false);
        }
    }

    // keep processing until all the pending tasks have been completed...it should be no more than MAX_CONCURRENT_TASKS
    while(this._tasks.Any())
        await ProcessCompletedTasks().ConfigureAwait(false);

    return this._combinationsReal;
}

The next method ProcessCompletedTasks will wait for at least one of the existing tasks to complete. After that, it will take all the completed tasks from the list (that one which finished and any other which may have been finished together), and get the result of them (the combinations).

With each processedCombinations, it'll merge with this._combinationsReal (using the same logic you provided in your question).

private async Task ProcessCompletedTasks()
{
    await Task.WhenAny(this._tasks).ConfigureAwait(false);
    var completedTasks = this._tasks.Where(t => t.IsCompleted).ToArray();
    // completedTasks will have at least one task, but it may have more ;)
    foreach (var completedTask in completedTasks)
    {
        var processedCombinations = await completedTask.ConfigureAwait(false);
        foreach (var pair in processedCombinations)
        {
            if (this._combinationsReal.ContainsKey(pair.Key))
                this._combinationsReal[pair.Key].Update(pair.Value);
            else
                this._combinationsReal.Add(pair.Key, pair.Value);
        }
        this._tasks.Remove(completedTask);
    }
}

For each processedCombinations merged in _combinationsReal, it will remove its respective task from the list, and move on (start adding more tasks again). This will happen until we have created all the tasks for all iterations.

Finally, we keep processing it, until there are no more tasks in the list.

If you monitor the RAM consumption, you'll notice it will increase to about 1.5 GB (when we have 4 tasks being processed concurrently), then decrease to about 0.8 GB (when we remove tasks from the list). At least this is what happened in my computer.

Here is a fiddle, however I had to decrease the number of itens from 900k to 100, because fiddle limits the memory usage to avoid abuse.

I hope this help you somehow.

One thing to notice about all this stuff, is that you will benefit from using concurrent tasks mostly if your ProcessCombinations (the method that is executed concurrently when processing those 900k items) calls external resources, like reading files from your HD, executing a query in a database, calling an API/WebService method. I guess that code is probably reading 900k items from an external resource, then this will reduce the time needed to process it.

If the items were previously loaded and ProcessCombinations is just reading data that was already in memory, then the concurrency won't help at all (actually I believe it would make your code ran slower). If that's the case, then we are applying concurrency in the wrong place.

Using async calls in parallel is likely to help more when said calls are going to access external resources (either to get or store data), and depending on how many concurrent calls that external resources can support, it may still not make such a difference.

How to avoid running out of RAM, during a concurrent data proccessing?

1 Answers1