1

I have a ConcurrentDictionary that I want to empty, copying its content into a dictionary (or any other container) while emptying it at the same time.

A key invariant is that no element from the source is lost during the operation.

My current code looks like this:

    private Dictionary<TKey, TValue> Foo<TKey, TValue>(ConcurrentDictionary<TKey, TValue> source)
    {
        var result = new Dictionary<TKey, TValue>();
        foreach (var item in source)
        {
            result[item.Key] = item.Value;
        }
        source.Clear();
        return result;
    }

From my understanding, this code is thread-safe but any element added concurrently after the foreach loop and before the Clear() will be cleared.

edit: some more precisions. In my use case, that code is the only one that does remove keys some the dictionary; other threads only ever TryAdd or AddOrUpdate.

mookid
  • 1,132
  • 11
  • 19
  • 2
    Why do you want this? – mjwills Feb 09 '21 at 11:14
  • you could use .ToDictionary() method to copy the items but you will still need to prevent your source dictionary from writing i.e. add some flag in your "writer" method that would prevent it from entering while you are copying. you can i.e. declare a "volatile" bool field to be used as flag – vhr Feb 09 '21 at 11:15
  • @mjwills I want to collect some values and log them in batch. The invariant corresponds to the requirement that all collected values are logged at some point. – mookid Feb 09 '21 at 11:29
  • Is this the _only_ thing removing entries from the dictionary? – mjwills Feb 09 '21 at 12:03
  • I don't think that concurrentQueue would work, as I need to change the value associated to a given key during the lifetime of the container. Same for ImmutableDictionary. But it is indeed the only thing removing entries from the dictionary. – mookid Feb 09 '21 at 12:04
  • If you want all the contents from a `ConcurrentDictionary`, `ToArray` is the simplest solution (and then `ToDictionary` on that). That doesn't solve the clearing though. So you will need to `ToArray` then `TryRemove` - https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.tryremove?view=net-5.0#System_Collections_Concurrent_ConcurrentDictionary_2_TryRemove_System_Collections_Generic_KeyValuePair__0__1__ . – mjwills Feb 09 '21 at 12:15
  • 1
    @mjwills an `ImmutableDictionary` will probably be ~10 times slower than a `ConcurrentDictionary`, and also significantly more allocatey. It is also not concurrent. It needs synchronization every time it is updated. It could be an interesting solution in case the OP wants to take its values and clear it very often, because taking snapshots of immutable collections is essentially free. – Theodor Zoulias Feb 09 '21 at 12:18
  • @mjwills my understanding is that with ImmutableDictionary you lose the benefits of good concurrent read provided by ConcurrentDictionary. in my use case, the snapshot operation is relatively rare. – mookid Feb 09 '21 at 12:22
  • 1
    Then `ToArray` and https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.tryremove?view=net-5.0#System_Collections_Concurrent_ConcurrentDictionary_2_TryRemove_System_Collections_Generic_KeyValuePair__0__1__ will do you nicely. This allows you to clone the dictionary, then remove only entries from the original that match your clone. – mjwills Feb 09 '21 at 12:23
  • Basically you want a copy of a snapshot and then empty the original. Have you considered setting it up in a way where you can just swap the current filled `ConcurrentDictionary` with a new one? (so basically the original is your copy and the empty collection is the new one) –  Feb 09 '21 at 12:24
  • @mjwills instead of a standard `lock`, the [`ImmutableInterlocked.AddOrUpdate`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.immutable.immutableinterlocked.addorupdate) method could be used. But yeah, the situations where an `ImmutableDictionary` is the optimal solution does not come up very often. – Theodor Zoulias Feb 09 '21 at 12:24
  • @Knoop If you do this, you will need to have some kind of lag built in - since some callers will be _inside_ the old one for a little while. – mjwills Feb 09 '21 at 12:24
  • Thanks @TheodorZoulias, learned a bunch of useful stuff! – mookid Feb 09 '21 at 12:28
  • Thanks @mjwills learned a bunch of useful stuff! – mookid Feb 09 '21 at 12:28
  • Let me know if you'd like me to write up the `ToArray` and https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.tryremove?view=net-5.0#System_Collections_Concurrent_ConcurrentDictionary_2_TryRemove_System_Collections_Generic_KeyValuePair__0__1__ approach - but honestly it is only slightly different to the below answer. – mjwills Feb 09 '21 at 12:29
  • 1
    `The invariant corresponds to the requirement that all collected values are logged at some point.` Note that you aren't _strictly_ meeting this criteria - since if two writes to the same key occur in close succession you aren't necessarily logging the first one. Perhaps consider _two_ data structures - concurrent dictionary to store the data, concurrent queue to log it? – mjwills Feb 09 '21 at 12:30
  • @mjwills Yes, definately not trying to say this is an out of the box perfect solution for all situations. You need to set it up in a way that supports it. It all depends on how exactly everything is used. Just throwing it in there as a possible consideration. –  Feb 09 '21 at 12:31
  • @Knoop For sure - I've used that approach before. It can be quite useful. – mjwills Feb 09 '21 at 12:32

1 Answers1

2

The only way I can think of is to TryRemove all existing keys, one by one:

public static Dictionary<TKey, TValue> RemoveAll<TKey, TValue>(
    this ConcurrentDictionary<TKey, TValue> source)
{
    var result = new Dictionary<TKey, TValue>();
    foreach (var key in source.Keys)
    {
        if (source.TryRemove(key, out var value)) result.Add(key, value);
    }
    return result;
}

The Keys property returns a snapshot of the keys in the ConcurrentDictionary, so it is impossible to contain twice the same key. But it generates contention, because it acquires all internal locks. Below is a version that uses the lazy enumerator of the ConcurrentDictionary, and so it shouldn't generate much contention:

public static Dictionary<TKey, TValue> RemoveAll<TKey, TValue>(
    this ConcurrentDictionary<TKey, TValue> source)
{
    var result = new Dictionary<TKey, TValue>();
    foreach (var entry in source)
    {
        var (key, value) = entry;
        if (!result.ContainsKey(key) && source.TryRemove(key, out value))
            result.Add(key, value);
    }
    return result;
}

The reason for the !result.ContainsKey(key) is the (theoretical) possibility of the enumerator yielding twice the same key. Based on the current implementation of the ConcurrentDictionary, this cannot happen.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Clarification: the above solution is not a snapshot operation. It will remove and collect any key-value pairs it finds "on the go". – Theodor Zoulias Feb 09 '21 at 12:10
  • `ToArray` would allow you to log the snapshotted values if you wanted to do that. You could even use https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.concurrentdictionary-2.tryremove?view=net-5.0#System_Collections_Concurrent_ConcurrentDictionary_2_TryRemove_System_Collections_Generic_KeyValuePair__0__1__ to ensure the entry removed from the `ConcurrentDictionary` had the same value (hadn't been mutated since). – mjwills Feb 09 '21 at 12:14
  • @mjwills the `ToArray` acquires all the internal locks, so it will cause all threads that interact with the `ConcurrentDictionary` to be blocked while the snapshot is generated. This is also a problem for the first version of the `RemoveAll` in my answer. – Theodor Zoulias Feb 09 '21 at 12:30
  • True. The benefit of `ToArray` is that your logging can reflect the keys **and values** at a specific point in time (snapshotted). – mjwills Feb 09 '21 at 12:36
  • @mjwills what are you going to do with the key-value pairs in the snapshot that no longer exist when you try to remove them? If you don't include them in the output, then we cannot talk about a snapshot any more. A subset of a snapshot is not a snapshot. – Theodor Zoulias Feb 09 '21 at 12:41
  • Include them in the output. But don't remove them from the source (unlike your first `RemoveAll`). – mjwills Feb 09 '21 at 12:42
  • @mjwills yeap, this can be a valid solution, depending on the desirable semantics. It still leaves open the possibility of removing a key-value pair that was removed and added again after the snapshot was taken. So it will not behave the same with an atomic `ToArray`+`Clear` API, if this API existed. – Theodor Zoulias Feb 09 '21 at 12:53
  • That is true, yes. – mjwills Feb 09 '21 at 12:56