2

Firstly, i'd just like to mention that I've only started learning C# a few days ago so my knowledge of it is limited.

I am merging multiple dictionaries having same type of key value pair into a single one.

The following is my approach which works and also handles the duplicates

 var result = dict1.Concat(dict2).GroupBy(d => d.Key)
              .ToDictionary(d => d.Key, d => d.First().Value);

            result = result.Concat(dict3).GroupBy(d => d.Key)
                .ToDictionary(d => d.Key, d => d.First().Value);

            result = result.Concat(dict4).GroupBy(d => d.Key)
                .ToDictionary(d => d.Key, d => d.First().Value);

            result = result.Concat(dict5).GroupBy(d => d.Key)
    .ToDictionary(d => d.Key, d => d.First().Value); 

I would like to know if there is an efficient way of merging multiple dictionaries having key value pair of same data type.

Tango
  • 386
  • 1
  • 6
  • 29
  • Do you control initialization? – Karolis Kajenas May 23 '17 at 16:19
  • This doesn't look efficient. Also there doesn't seem to be a clear implementation of how you want to handle overriding of the **values**. Are we to assume that `dict5` has precedence over `dict4`, which has precedence over `dict3` (and so on)... ? If you can help clarify, we could suggest some implementations. – Svek May 23 '17 at 16:20
  • @Karolis yes, I have like 3 different dictionaries which I merge in one of my methods to perform a task – Tango May 23 '17 at 16:21
  • @Tango Is it possible for you to just simple add all entries to the same dictionary. Is there some particular reason why you choose to have 5 different dictionaries? – Karolis Kajenas May 23 '17 at 16:31
  • @Karolis I have 5 different methods. Each method consists of a dictionary. I have a main method which works on the data from each method. I am merging all the dictionaries so that I can use it to manipulate the collective data. – Tango May 23 '17 at 16:35
  • Why wont you pass dictionary to each of these methods? – Karolis Kajenas May 23 '17 at 16:36
  • What does "efficient" mean to you? – Kenneth K. May 23 '17 at 16:37
  • @KennethK. a better way to implement the same. (Learning purpose) – Tango May 23 '17 at 16:40
  • What does "better" mean to you? – Kenneth K. May 23 '17 at 16:41

2 Answers2

17

Since dictionaries implement IEnumerable<KeyValuePair<TKey, TValue>>, you can simply write:

var result = dict1
    .Concat(dict2)
    .Concat(dict3)
    .Concat(dict4)
    .Concat(dict5)
    .ToDictionary(e => e.Key, e => e.Value);

This assumes that there are no duplicate keys.

If there are duplicate keys, you could get the first value for each key

result = dict1
    .Concat(dict2)
    .Concat(dict3)
    .Concat(dict4)
    .Concat(dict5)
    .GroupBy(e => e.Key)
    .ToDictionary(g => g.Key, g => g.First().Value);

Other variants are conceivable, like keeping the maximum/minimum value etc.

If there are duplicate keys with different values, you could also create a dictionary of value lists

Dictionary<TKey, List<TValue>> result = dict1
    .Concat(dict2)
    .Concat(dict3)
    .Concat(dict4)
    .Concat(dict5)
    .GroupBy(e => e.Key, e => e.Value)
    .ToDictionary(g => g.Key, v => v.ToList());

Instead of creating a List<T> of values, you could insert them into a HashSet<T> to only keep unique values.

If the values are always the same for duplicate keys then simply use Union instead of Concat:

var result = dict1
    .Union(dict2)
    .Union(dict3)
    .Union(dict4)
    .Union(dict5)
    .ToDictionary(e => e.Key, e => e.Value);

Union produces the set union of two sequences. Concat concatenates two sequences.

Finally, you can combine the two preceding approaches and discard equal key/value pairs, but keep a list of different values per key:

Dictionary<TKey, List<TValue>> result = dict1
    .Union(dict2)
    .Union(dict3)
    .Union(dict4)
    .Union(dict5)
    .GroupBy(e => e.Key, e => e.Value)
    .ToDictionary(g => g.Key, v => v.ToList());

These examples show that it is important to know exactly how the input data is shaped (unique/non-unique keys and key-value-pairs) and precisely what kind of result you expect.


A different approach would be to let your different methods return lists or enumerations instead of dictionaries and merge these collections into a dictionary at the end. This would be more performing.

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
  • There may be duplicate keys. – Tango May 23 '17 at 16:36
  • 1
    In this case what do you want to make with the values? (Because the values might be different, even if the keys are the same, right?) – Olivier Jacot-Descombes May 23 '17 at 16:38
  • True, the mentioned key and it's the value is always going to be the same throughout. – Tango May 23 '17 at 16:41
  • Am I doing something wrong if I get an exception stating `An item with the same key has already been added` when calling `.Union()` as described above? I figured the whole point of a union was to handle that for me. – Jonathan B. Mar 05 '18 at 20:17
  • Based on a comment from @jon-skeet, the `.Union()` method is also [dumbly] union-ing the contents, which if contain a collection, could result in duplicate keys: https://stackoverflow.com/questions/4791931/merge-dictionarytkey-tvalue-with-enumerable-union-method – Jonathan B. Mar 05 '18 at 20:25
  • Yes, therefore I say "If the values are always the same for duplicate keys ..." in the first example. The second example groups by key. – Olivier Jacot-Descombes Mar 05 '18 at 20:56
2

Although it doesn't use any pretty Linq, I think the following will be more efficient. It creates only one additional dictionary, which is the result. It is sized initially so that there will be no grows. In addition, the number of inserts will be exactly the number of elements in the result Dictionary.

I think this will be more efficient than creating several intermediary dictionaries or other collections, or doing things in a way that results in the new dictionary or intermediary dictionaries having to go through multiple growth resizes. In the middle foreach, I don't know if it's more efficient to check against dict1 or result for the ContainsKey. I checked against dict1 because there is no need to check result which will have more and more values from dict2, and we know that no key in dict2 is in dict2 more than once.

var result = new Dictionary<MyKeyType, MyValueType>(dict1.Count + dict2.Count + dict3.Count
    + dict4.Count + dict5.Count);
foreach(var pair in dict1) {
    result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict2) {
    if (!dict1.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict3) {
    if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict4) {
    if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}
foreach(var pair in dict5) {
    if (!result.ContainsKey(pair.Key)) result.Add(pair.Key, pair.Value);
}

In a timing test with 5 large dictionaries having mostly unique keys between them, it worked out like this (times in milliseconds):

  • 1037 ms for your code
  • 357 ms for the middle block of code in the other answer using Linq
  • 784 ms for the third block of code in the other answer using Linq
  • 43 ms for the code above using foreach

In the case of a key being in multiple dictionaries, the first value is the one that's used, because you haven't specified any specific way you want to handle that situation.