-1

ConcurrentDictionary works well for concurrent situations when mapping keys to a single value each. When mapping to multiple values, it is easy to create a ConcurrentDictionary<K, List<V>> and guard its addition/removal functions.

ConcurrentDictionary <string, List<string>> d;

// Add
var list = d.GetOrAdd ("key", x => new List<string> ());
lock (list) {
    list.Add ("value to add");
}

// Remove
if (d.TryGetValue ("key", out var list)) {
    lock (list) {
        list.Remove ("value to remove");
    }
}

However, the above assumed that empty lists are allowed to stay. I don't want that. But removing empty pairs does not seem to be possible in an atomic fashion. One might try:

if (d.TryGetValue ("key", out var list)) {
    lock (list) {
        if (list.Remove ("value to remove") && list.Count == 0) {
            d.TryRemove ("key", out _);
        }
    }
}

But this has a race condition when another thread grabs the list before but adds to it after it was emptied and removed elsewhere:

  • A: get list
  • B: get list
  • B: lock, remove from list
  • B: list is empty, delete key, unlock
  • A: lock, add to list, unlock

Locking on the dictionary is not possible (it's a different use case).

As far as I can tell, a solution would usually be found using compare-and-swap operations and replacing the list with e.g. an immutable array that is then replaced in its entirety. However, given that ConcurrentDictionary does not offer a TryRemove with an expected value to compare against, I don't quite see how. Possibly there is a two-stage solution?

Using the out parameter of TryRemove to add values again after removing them (to fix race cases) is not possible - the dictionary would briefly be in an inconsistent state.

There are many questions on this site asking about similar scenarios, but most of them suffer from trivial mistakes or do not remove empty entries. There is this highly related question which asks if it is possible to do this. Sadly, it is five years old, received very little attention and has no solution apart from resorting to locks (which defeats the purpose). Possibly there opened up a better way since that time.


(Edited example for clarity)

mafu
  • 31,798
  • 42
  • 154
  • 247
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/209729/discussion-on-question-by-mafu-concurrentdictionary-with-multiple-values-per-key). – Samuel Liew Mar 16 '20 at 22:52
  • Related: [How to achieve remove_if functionality in .NET ConcurrentDictionary](https://stackoverflow.com/questions/39679779/how-to-achieve-remove-if-functionality-in-net-concurrentdictionary). – Theodor Zoulias Oct 16 '22 at 09:16

3 Answers3

2

I managed to implement a ConcurrentMultiDictionary class that stores multiple values per key, and with empty entries removed. The values of each key are stored in a HashSet, so each key has unique values. This increases the performance of deleting a value when the number of values is large. If the uniqueness is a problem then the HashSet should be replaced with a List, and the Add method should be modified to return void instead of bool.

The atomicity of the adding and removing operations is achieved by spinning. When a bag of values becomes empty, it is flagged as "discarded". Adding values into a discarded bag is not allowed, so the Add operation spins until it grabs a non discarded bag. The Remove operation spins too. So the only thread that is allowed to remove a discarded bag is the same thread that marked the bag as discarded. All other threads will be spinning until that happens. SpinWait structs are used for the spinning, to ensure efficiency even in single processor machines.

An unsolvable problem of this implementation is how to implement a ToArray method that takes a snapshot of all keys and values stored in the dictionary. The ConcurrentDictionary.ToArray method returns a snapshot of the keys, but the bags can be constantly changing, and this is why I believe it is unsolvable.

Even implementing the IEnumerable interface is a bit tricky, because if we just enumerate the KeyValuePairs of the underlying dictionary, most of the bags could be discarded at the time their values are acquired. So during the enumeration the bag of each key is retrieved individually, to be as current as possible.

public class ConcurrentMultiDictionary<TKey, TValue>
    : IEnumerable<KeyValuePair<TKey, TValue[]>>
{
    private class Bag : HashSet<TValue>
    {
        public bool IsDiscarded { get; set; }
    }

    private readonly ConcurrentDictionary<TKey, Bag> _dictionary;

    public ConcurrentMultiDictionary()
    {
        _dictionary = new ConcurrentDictionary<TKey, Bag>();
    }

    public int Count => _dictionary.Count;

    public bool Add(TKey key, TValue value)
    {
        var spinWait = new SpinWait();
        while (true)
        {
            var bag = _dictionary.GetOrAdd(key, _ => new Bag());
            lock (bag)
            {
                if (!bag.IsDiscarded) return bag.Add(value);
            }
            spinWait.SpinOnce();
        }
    }

    public bool Remove(TKey key, TValue value)
    {
        var spinWait = new SpinWait();
        while (true)
        {
            if (!_dictionary.TryGetValue(key, out var bag)) return false;
            bool spinAndRetry = false;
            lock (bag)
            {
                if (bag.IsDiscarded)
                {
                    spinAndRetry = true;
                }
                else
                {
                    bool valueRemoved = bag.Remove(value);
                    if (!valueRemoved) return false;
                    if (bag.Count != 0) return true;
                    bag.IsDiscarded = true;
                }
            }
            if (spinAndRetry) { spinWait.SpinOnce(); continue; }
            bool keyRemoved = _dictionary.TryRemove(key, out var currentBag);
            Debug.Assert(keyRemoved, $"Key {key} was not removed");
            Debug.Assert(bag == currentBag, $"Removed wrong bag");
            return true;
        }
    }

    public bool TryGetValues(TKey key, out TValue[] values)
    {
        if (!_dictionary.TryGetValue(key, out var bag)) { values = null; return false; }
        bool isDiscarded;
        lock (bag) { isDiscarded = bag.IsDiscarded; values = bag.ToArray(); }
        if (isDiscarded) { values = null; return false; }
        return true;
    }

    public bool Contains(TKey key, TValue value)
    {
        if (!_dictionary.TryGetValue(key, out var bag)) return false;
        lock (bag) return !bag.IsDiscarded && bag.Contains(value);
    }

    public bool ContainsKey(TKey key) => _dictionary.ContainsKey(key);

    public ICollection<TKey> Keys => _dictionary.Keys;

    public IEnumerator<KeyValuePair<TKey, TValue[]>> GetEnumerator()
    {
        foreach (var key in _dictionary.Keys)
        {
            if (this.TryGetValues(key, out var values))
            {
                yield return new KeyValuePair<TKey, TValue[]>(key, values);
            }
        }
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

This implementation was tested with 8 concurrent workers mutating a dictionary a million times per worker, and no inconsistency regarding the reported number of additions and removals was observed.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Could you upload your test code? (if large, possibly elsewhere e.g. Github gist or Pastebin etc) – mafu Mar 17 '20 at 12:56
  • Clever approach! Seems to be correct. And it is sufficient with the standard ConcurrentDictionary. – mafu Mar 17 '20 at 14:09
  • 1
    @mafu sure, here is a [fiddle](https://dotnetfiddle.net/xqUvYQ) of the test. The test involves a single key, so it is not completely thorough. The number of values, workers and loops is configurable. There are some slight changes to the API of the tested class, so that statistics about the spinning can be collected. – Theodor Zoulias Mar 17 '20 at 19:00
1

There seems to be no practical way of removing an empty collection (even if it is synchronized) from a concurrent dictionary without having race condition issues. There are certain facts preventing this from being possible, as discussed in the comments under both the question and the OP's self answer.

What I wrote in my comment, however, seemed feasible and I wanted to give it a try.

I want to discuss the drawbacks of this implementation right after, and I should also say that your comments (if received any) are what is most valuable to me.

First, the usage:

    static void Main(string[] args)
    {
        var myDictionary = new ConcurrentDictionary<string, IList<int>>();
        IList<int> myList = myDictionary.AddSelfRemovingList<string, int>("myList");

        myList.Add(5);
        myList.Add(6);

        myList.Remove(6);
        myList.Remove(5);

        IList<int> existingInstance;

        // returns false:
        bool exists = myDictionary.TryGetValue("myList", out existingInstance);

        // throws HasAlreadyRemovedSelfException:
        myList.Add(3);
    }

AddSelfRemovingList is an extension method to make things easier.

For the discussion part:

  1. It is not acceptable for the removal of an item from a collection to have a side effect of removing the collection reference from the owning dictionary.
  2. It is also not good practice to make the collection obsolete (unusable) when all its items are removed. There is a strong possibility that the consumer of the collection wants to clear and re-fill the collection and this implementation does not allow that.
  3. It forces the use of IList<T> abstraction and a custom implementation over List<T>

Although this provides a real thread-safe way of removing a just emptied collection from the dictionary, there seem to be more cons than pros to it. This should only be used in a closed context where the collections inside the concurrent dictionary are exposed to the outside, and where the immediate removal of a collection when emptied, even if some other thread is accessing it at the moment, is essential.

Here is the extension method to create and add the self removing list to the dictionary:

public static class ConcurrentDictionaryExtensions
{
    public static IList<TValue> AddSelfRemovingList<TKey, TValue>(this ConcurrentDictionary<TKey, IList<TValue>> dictionaryInstance, TKey key)
    {
        var newInstance = new SelfRemovingConcurrentList<TKey, TValue>(dictionaryInstance, key);
        if (!dictionaryInstance.TryAdd(key, newInstance))
        {
            throw new ArgumentException("ownerAccessKey", "The passed ownerAccessKey has already exist in the parent dictionary");
        }
        return newInstance;
    }
}

And finally; here is the synchronized, self-removing implementation of IList<T>:

public class SelfRemovingConcurrentList<TKey, TValue> : IList<TValue>
{
    private ConcurrentDictionary<TKey, IList<TValue>> owner;
    private TKey ownerAccessKey;
    List<TValue> underlyingList = new List<TValue>();
    private bool hasRemovedSelf;

    public class HasAlreadyRemovedSelfException : Exception
    {

    }

    internal SelfRemovingConcurrentList(ConcurrentDictionary<TKey, IList<TValue>> owner, TKey ownerAccessKey)
    {
        this.owner = owner;
        this.ownerAccessKey = ownerAccessKey;
    }

    private void ThrowIfHasAlreadyRemovedSelf()
    {
        if (hasRemovedSelf)
        {
            throw new HasAlreadyRemovedSelfException();
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    int IList<TValue>.IndexOf(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.IndexOf(item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void IList<TValue>.Insert(int index, TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.Insert(index, item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void IList<TValue>.RemoveAt(int index)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.RemoveAt(index);
        if (underlyingList.Count == 0)
        {
            hasRemovedSelf = true;
            IList<TValue> removedInstance;
            if (!owner.TryRemove(ownerAccessKey, out removedInstance))
            {
                // Just ignore.
                // What we want to do is to remove ourself from the owner (concurrent dictionary)
                // and it seems like we have already been removed!
            }
        }
    }

    TValue IList<TValue>.this[int index]
    {
        [MethodImpl(MethodImplOptions.Synchronized)]
        get
        {
            ThrowIfHasAlreadyRemovedSelf();
            return underlyingList[index];
        }
        [MethodImpl(MethodImplOptions.Synchronized)]
        set
        {
            ThrowIfHasAlreadyRemovedSelf();
            underlyingList[index] = value;
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void ICollection<TValue>.Add(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.Add(item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void ICollection<TValue>.Clear()
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.Clear();
        hasRemovedSelf = true;
        IList<TValue> removedInstance;
        if (!owner.TryRemove(ownerAccessKey, out removedInstance))
        {
            // Just ignore.
            // What we want to do is to remove ourself from the owner (concurrent dictionary)
            // and it seems like we have already been removed!
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    bool ICollection<TValue>.Contains(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.Contains(item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void ICollection<TValue>.CopyTo(TValue[] array, int arrayIndex)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.CopyTo(array, arrayIndex);
    }

    int ICollection<TValue>.Count
    {
        [MethodImpl(MethodImplOptions.Synchronized)]
        get
        {
            ThrowIfHasAlreadyRemovedSelf();
            return underlyingList.Count;
        }
    }

    bool ICollection<TValue>.IsReadOnly
    {
        [MethodImpl(MethodImplOptions.Synchronized)]
        get
        {
            ThrowIfHasAlreadyRemovedSelf();
            return false;
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    bool ICollection<TValue>.Remove(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        bool removalResult = underlyingList.Remove(item);
        if (underlyingList.Count == 0)
        {
            hasRemovedSelf = true;
            IList<TValue> removedInstance;
            if (!owner.TryRemove(ownerAccessKey, out removedInstance))
            {
                // Just ignore.
                // What we want to do is to remove ourself from the owner (concurrent dictionary)
                // and it seems like we have already been removed!
            }
        }
        return removalResult;
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    IEnumerator<TValue> IEnumerable<TValue>.GetEnumerator()
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.GetEnumerator();
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    IEnumerator IEnumerable.GetEnumerator()
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.GetEnumerator();
    }
}
Oguz Ozgul
  • 6,809
  • 1
  • 14
  • 26
  • In the extensions, if the key already exists, should the method return the newly created instance of the list, or the one already present? – mafu Mar 16 '20 at 13:11
  • Good question @mafu. By renaming the method to AddOrGetSelfRemovingList(), that can be communicated I think. But what if the instance is not a self removing list, but a regular List? It should still throw in that case I think. – Oguz Ozgul Mar 16 '20 at 13:17
  • I did not test it, but from the code, the idea of this answer seems to work well. The basic trick is to move the remove-from-list list inside the lock on the list, which is not possible when the dictionary has to be queried before the lock. The locking scope is also small (no locks on the dictionary). – mafu Mar 16 '20 at 13:18
  • Re `AddOrGetSelfRemovingList`, I agree it should be renamed and then exhibit this behavior. That would follow the usual pattern. The code can also easily be adjusted to ensure only the expected types are present (e.g. via container class). – mafu Mar 16 '20 at 13:22
  • But I must make the note again; the removal from the dictionary is the side effect of list.Remove() or list.RemoveAt() etc. If the API consumer does not know anything about the dictionary, this can be acceptable. Also, this design does not prevent the owner of the dictionary to remove a list manually, and if another object is set to the dictionary with the same key, emptying the list will remove the new item from the dictionary. Maybe we need to check the removed instance (to test if it equals `this`) and put it back if not. – Oguz Ozgul Mar 16 '20 at 13:23
-2

The question can be solved by using a dictionary that offers a variant of TryRemove that first checks that the current value is equal to an expected value. Only if the values compare equal, the value is replaced (atomically). Otherwise, the operation returns failure.

It turns out ConcurrentDictionary already implements exactly this functionality:

/// <summary>
/// Removes the specified key from the dictionary if it exists and returns its associated value.
/// If matchValue flag is set, the key will be removed only if is associated with a particular
/// value.
/// </summary>
/// <param name="key">The key to search for and remove if it exists.</param>
/// <param name="value">The variable into which the removed value, if found, is stored.</param>
/// <param name="matchValue">Whether removal of the key is conditional on its value.</param>
/// <param name="oldValue">The conditional value to compare against if <paramref name="matchValue"/> is true</param>
/// <returns></returns>
private bool TryRemoveInternal(TKey key, out TValue value, bool matchValue, TValue oldValue)

TryRemove calls this (with matchValue set to false). The method is sadly not exposed (it is private). A simple solution would thus be to copy the existing class and change this method to be public. I'm not sure why it was not exposed. If the specific functionality were not working well, matchValue would most likely have been removed earlier.

As @Theodor Zoulias notes, it is also possible to invoke the private TryRemoveInternal method by using reflection. As far as I know, this can be done without major impact on performence.

There are also third party implementations with (claimed) high performance and concurrency that exhibit a TryRemove (..., expectedValue).

Once an implementation is chosen, the following code implements the asked for functionality. It uses the atomic compare-and-swap operations provided by the dictionary in a loop until it succeeds (similar to what many concurrent dictionaries do internally, too). As far as I'm aware, this is a typical approach in lock-free algorithms.

// Use any third-party dictionary that offers TryRemove() with
// a value to compare against (two are mentioned above)
ConcurrentDictionary<TKey, List<TValue>> d;
...

// To remove a value from key:
// Loop until the compare-and-swap of either update or removal succeeded
while (true)
{
    // If the key does not exist, exit
    if (!d.TryGetValue (key, out var list)) {
        break;
    }

    // Remove the value from this key's entry:
    // Consider the old value immutable, copy-and-modify it instead
    List<TValue> newlist;
    lock (list) {
        newlist = list.Where (it => it != valueToRemove).ToList ();
    }

    // If the value list is not empty, compare-and-update it
    if (newlist.Count > 0) {
        if (d.TryUpdate (key: key, newValue: newlist, expectedValue: list)) {
            return;
        }
    }
    else // The key's value list is empty - compare-and-remove the entire key
    {
        // Remove the key iff the associated value is still the same
        if (d.TryRemove (key: key, expectedValue: list)) { // Note that list is an in-, not an out-parameter
            return;
        }
    }

    // If we reach this point, the operation failed - try again
}
mafu
  • 31,798
  • 42
  • 154
  • 247
  • 2
    This operation is not atomic. Because other threads can be adding or removing items from the list while this is all going on, meaning items added or removed from the collection can end up being lost when you assign a new list to the dictionary after another thread added or removed an item, and your list will not reflect that change. You'll need some sort of synchronization mechanism to prevent another thread from adding or removing items associated with that key while you're computing the new collection for it. – Servy Mar 15 '20 at 20:47
  • 2
    even the Where is not thread safe: An enumerator remains valid as long as the collection remains unchanged. If changes are made to the collection, such as adding, modifying, or deleting elements, the enumerator is irrecoverably invalidated and its behavior is undefined. – Carlos Garcia Mar 15 '20 at 20:55
  • 1
    If another thread, holding a reference to the same list (which it had retrieved earlier) adds/removes during this operation the linq Where() statement would fail with the well-known "Collection was modifed" error. I think there is no other way than to have a fully synchronized List here. – Oguz Ozgul Mar 15 '20 at 20:56
  • 2
    @CarlosGarcia The comment mentions turning the list into a ConcurrentBag, which if they actually did that, would fix that one problem. But that just means the code won't throw an exception, it still doesn't actually produce the correct results, as per my earlier comment. – Servy Mar 15 '20 at 20:59
  • @TheodorZoulias I added the type. However, note that this is not the type from the standard libraries, but a replacement (the post mentions two). – mafu Mar 16 '20 at 03:06
  • @Servy Thanks! My mistake – mafu Mar 16 '20 at 03:07
  • @Servy Re *it still doesn't actually produce the correct results*, if the list is (as required) fully synchronized, it should work, no? – mafu Mar 16 '20 at 03:09
  • I think that you should mention clearly inside your answer that the `ConcurrentDictionary` type in the code example is not the [`System.Collections.Concurrent.ConcurrentDictionary`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2) but some third-party class with the same name, providing the link to the corresponding nuget package if possible. As it stands now it's very confusing. – Theodor Zoulias Mar 16 '20 at 03:20
  • @TheodorZoulias Good point, I added an explicit note and also restructured the answer a bit. – mafu Mar 16 '20 at 03:28
  • Now it's better, but still not totally clear. What is missing is the signature of the methods `TryGetValue`, `TryUpdate` and `TryRemove`. We are talking about a third party class without a nuget package and without documentation. We shouldn't have to imagine the method signatures, we should be able to see them clearly. TBH I think that your answer was more useful before adding this code example. It is not clear what it's doing (what the hell is the purpose of this `while (true)`?), and most probably has thread safety issues. – Theodor Zoulias Mar 16 '20 at 03:51
  • @TheodorZoulias Thanks for all the feedback. It is very helpful. I've further edited the answer. I also agree the while loop in particular looks strange at first, but it seems to be quite common to achieve good concurrency. – mafu Mar 16 '20 at 04:30
  • Aha, now it makes sense. So your code example is the body of a method with a signature like this: `void RemoveMyself(MBur.Collections.ConcurrentDictionary> d, TKey key)`. I guess that the identifier `Associated` is a typo and should be replaced with `key`. Your idea seems to be to keep the lists practically immutable. Instead of mutating a list, you replace it with a new list and discard the old one. I don't like this idea because it should be terribly inefficient. Also will certainly add huge pressure to the garbage collector. – Theodor Zoulias Mar 16 '20 at 05:07
  • @TheodorZoulias Yes typo, a lot went wrong when I simplified my code for this answer. About replacing the list as a whole, as far as I know there simply is no better way without recreating the whole lock-free data structure. There may be a better structure than a `List` (I've seen `ImmutableArray` commonly). If there are *many* values per key, this kind of data structure may be a wrong choice in the first place. So I was assuming the value count to be low-ish (< 50 or so are very fast, especially with `ImmutableArray` to reduce GC pressure). – mafu Mar 16 '20 at 13:01
  • 1
    These points are more about (advanced) lock free data structures than extending or picking a replacement for `ConcurrentDictionary` tho, and I'm not very knowledgeable about them. I added that code because without it, my post seemed like it made an unsubstantiated claim about how this small change in the dictionary would magically allow another different feature to work. – mafu Mar 16 '20 at 13:03
  • You could also invoke the private `TryRemoveInternal` method by using reflection. See [this fiddle](https://dotnetfiddle.net/bbGm3Z) for an example. – Theodor Zoulias Mar 17 '20 at 09:32