1

I am fairly new to C#. I would like to "add" two Dictionaries together using parallelism.

Example:

Dictionary<string, int> dict1 = new()
{
    { "a",  1 },
    { "b",  2 },
    { "c",  3 },
    //...
};

Dictionary<string, int> dict2 = new()
{
    { "a",  1 },
    { "b",  2 },
    { "c",  3 },
    //...
};

Result of dict1 + dict2:

Dictionary<string, int> dict3 = new()
{
    { "a",  2 },
    { "b",  4 },
    { "c",  6 },
    //...
};

I currently have a solution where I wrap a Dictionary<string, int> in my Modifier class:

public class Modifier
{
    public Dictionary<string, int> modifier = new Dictionary<string, int>();

    public void Add(string key, int value){
        modifier.Add(key, value);
    }

    public bool ContainsKey(string key){
        return modifier.ContainsKey(key);
    }

    public int this[string key]{
        get => modifier[key];
        set => modifier[key] = value;
    }

    public static Modifier operator +(Modifier a, Modifier b){
        foreach (string key in b.modifier.Keys){
            if (a.ContainsKey(key)){
                a[key] += b[key];
            } else {
                a.Add(key, b[key]);
            }
        }
        return a;
    }
}

To add, I do Modifier result = a + b (where a and b are Modifier objects). However, for my application, I will be doing many of these "additions" on large dictionaries and am afraid it will not scale well.

I am not very familiar with C# parallel programming. I thought about splitting the keys into equal sub-groups and performing the addition one each group, but I am not quite sure how to implement it, or even if it is optimal.

What is the most efficient way to do this addition?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
branpham
  • 13
  • 3
  • 2
    why would you want to do it async? – pm100 Nov 09 '22 at 23:03
  • @pm100 The application will have very large dictionaries and will perform many "addition" operations, I would like to make it as efficient as possible. – branpham Nov 09 '22 at 23:09
  • 2
    By *"asynchronously"* you probably mean *"in parallel"*. You can check out the definitions here: [What is the difference between concurrency, parallelism and asynchronous methods?](https://stackoverflow.com/questions/4844637/what-is-the-difference-between-concurrency-parallelism-and-asynchronous-methods) – Theodor Zoulias Nov 09 '22 at 23:10
  • 2
    Async programming isn't about efficient. If anything Async programming is less efficient because it requires the use of state machines every time you invoke an async method. https://devblogs.microsoft.com/premier-developer/the-performance-characteristics-of-async-methods/ If you're ALSO wanting to ensure that your user interface doesn't hang while you're adding these together... then maybe Async is what you want. – Kevon Nov 09 '22 at 23:15
  • @TheodorZoulias Thank you for the clarification. I have edited my post for clarity. – branpham Nov 09 '22 at 23:21
  • @Kevon I am creating a video game using Unity, so there is definitely user input. I do not want the FPS suffer too much doing these additions. – branpham Nov 09 '22 at 23:24
  • `am afraid it will not scale well` How much time have you spent trying to solve a problem you're not sure exists? Don't prematurely optimize - just implement it the best you can and optimize later when you can prove it's an issue. – Chuck Nov 09 '22 at 23:26
  • Also it sounds like you're assuming a solution here. You're asking how to optimize this solution you thought of without telling us the problem you're trying to solve. – Chuck Nov 09 '22 at 23:27
  • @branpham Since UI is an element, then yes... Async might be for you. https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/... essentially, write a method on your class that returns a Task instead of void, and then in the UI thread you can use await it so that it is not blocking. You MAY still want to run that in parallel to get better performance though. If you think this is the right path, I can try to put up some sample code as an answer – Kevon Nov 09 '22 at 23:29
  • As a side-note, the `operator +(Modifier a, Modifier b)` is mutating the `a` modifier instead of returning a new modifier. This is counterintuitive. In general when you add two things together you expect to get the result of the addition as a third thing, not to mutate the first of the two things. I can't think of any built-in .NET type that has a `+` operator that behaves like this. – Theodor Zoulias Nov 10 '22 at 11:13
  • 1
    @TheodorZoulias You are absolutely correct. I have just tested it. I will update it. – branpham Nov 11 '22 at 05:20

1 Answers1

3

Parallelism is unlikely to yield impressive performance improvements in this case. Actually it's more likely to make your operators slower than faster, on top of being tricky to implement and easy to get it wrong. My suggestion is to focus at optimizing the single-thread performance of your operators. Here is a suggestion:

public static Modifier operator +(Modifier a, Modifier b)
{
    foreach ((string key, int value) in b.modifier)
    {
        ref int valueRef = ref CollectionsMarshal
            .GetValueRefOrAddDefault(a.modifier, key, out bool exists);
        valueRef = exists ? valueRef + value : value;
    }
    return a;
}

Only one string hashing is performed in each iteration of the foreach loop. This alone will probably give you a performance boost of 100% or more. You can read about the CollectionsMarshal.GetValueRefOrAddDefault method in the docs (it is available from .NET 6 and later). Be careful, it's a low-level method.

You might be able to improve the performance even further by initializing the dictionaries with a specialized IEqualityComparer<string>, provided that you have specific knowledge about the inner patterns of the actual strings that will be used as keys. You might find interesting a related API proposal on GitHub: Provide optimized read-only collections.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104