4

I have a static ConcurrentDictionary in a static class. In the static constructor of the class, I call a private method via Task.Run to indefinitely loop through the dictionary and remove items that have expired, i.e. a significant amount of time has passed since they were added and as such need to be removed (this is determined by by inspecting a property on each CustomItem value of each item in the dictionary):

public static class ItemsHolder
{
    public static ConcurrentDictionary<Guid, CustomItem> Items = new ConcurrentDictionary<Guid, CustomItem>();

   // ...

    static ItemsHolder
    {
        Task.Run(() => DeleteExpired());
    }

    private static async Task DeleteExpired()
    {
        while (true)
        {
            // check if dictionary is empty and if yes, 
            // await Task.Delay a bit, etc - omitted for brevity

            foreach (var item in Items)
            {
                var guid = item.Key;
                var customItem = item.Value;

                if (customItem.ConditionToRemoveItemFromDictionaryHere)
                {

                    var removeItem = Items.TryRemove(guid, out _);

                    // ...
                }
            }
        }
    }
}

In other parts of the code (outside of this class), items are being added to this dictionary and removed based on other conditions. Effectively DeleteExpired() is there to clean out items that could not be removed from the dictionary (in other parts of the code, for XYZ reasons)

I see that DeleteExpired() keeps popping up in my metrics in tools such as dotTrace in R# as occupying CPU time. I know that iterating through the concurrent dictionary while other threads might be adding/removing is lock-free and should be safe to do, but I'm not sure whether this is the most effective way of doing so. Is there a more effective/efficient way of doing so?

globetrotter
  • 997
  • 1
  • 16
  • 42
  • You can use a SortedDictionary to order items by expiry date and pull from the head of that list. It's not concurrent but you can use N such lists and use lock sharding which should help a lot. – usr Jun 21 '18 at 09:51
  • hi @usr, can you explain why use `N` such lists and what you mean by lock sharding? Is that manually locking the collection while performing operations on it? – globetrotter Jun 21 '18 at 10:02
  • @globetrotter - You can't remove items from a collection that you're iterating over. It doesn't matter if they are a `Concurrent*` or standard collection. This is true with a single-thread. It is worse with multiple threads. – Enigmativity Jun 21 '18 at 10:29
  • If there is just one list all threads contend on that lock. Maybe you are using a ConcurrentDictionary because there is a lot of action from many threads. But actually you could maybe make this far more efficient by using a simple Dictionary and lock it. If the write frequency is not too high this would make iteration a lot faster. Depends on your write frequency. But lock sharding is a solution to write contention. Btw, you could use lock sharding and N Dictionary instances as well. That way delete iteration would be fast as well as writes. – usr Jun 21 '18 at 10:38
  • `// check if dictionary is empty and if yes, await Task.Delay a bit, etc - omitted for brevity`. Are you implying you wait *only* if the dictionary is empty? If so, then you have your performance hog. – Alberto Chiesa Jun 21 '18 at 11:55
  • @Enigmativity pretty sure removal while enumerating on the concurrent collections [is possible](https://stackoverflow.com/a/2729316/1199044). – globetrotter Jun 21 '18 at 12:25
  • @A.Chiesa yes, I only asynchronously wait if the dictionary is empty so i don't attempt to enumerate with `foreach` if not. How would you suggest doing it? Sleeping after each removal iteration? What benefit would that give? – globetrotter Jun 21 '18 at 12:28
  • @globetrotter - Sorry, my mistake. I just built a test and you're right. Well that's going to save me some tears in the future. Cheers. – Enigmativity Jun 21 '18 at 14:20

2 Answers2

2

You're not waiting between each iteration, if the dictionary is not empty.

The while (true) makes a thread spin furiously and without rest (quite literally). You must absolutely wait between each check.

Without delay or sleep, one thread is going to continuously consume a whole core of your CPU (or at least it's going to try to do it), and this is the reason for the anomaly in performance.

I would write it as:

while(true) {
  Items.Where(pair => pair.Value.ConditionToRemoveItemFromDictionaryHere)
    .Select(pair => pair.Key)
    .ToList()
    .ForEach(guid => Items.TryRemove(guid, out _));

  // thread.sleep, task delay or whatever suits your needs
}
Alberto Chiesa
  • 7,022
  • 2
  • 26
  • 53
  • There's a code comment there where he said he left out waiting. And this is material to the question, because he's having a perf issue despite properly waiting. – usr Jun 21 '18 at 15:26
  • @usr there is a comment from the OP where i states "yes, I only asynchronously wait if the dictionary is empty so i don't attempt to enumerate with foreach if not". My native language is not english, but I think this means that he doesn't wait if the collection is not empty. – Alberto Chiesa Jun 21 '18 at 16:07
  • Oh, I see. Too bad that the true code was omitted. It would have been clearer that this is, likely, the issue. – usr Jun 22 '18 at 10:16
-1

I would have suggested to use a background task scheduler such as Quartz, as schedulers are especially made for that use case.

Create the Task for cleaning the dictionary:

public class CleanExpiredJob : IJob
{
    // Your ConcurrentDictionary value will be injected by the scheduler
    public ConcurrentDictionary<TKey, TValue> Items {private get; set;}
    public Task Execute(IJobExecutionContext context)
    {
        return Task.Run(() => {
            foreach (var item in Items)
            {
                var guid = item.Key;
                var customItem = item.Value;

                if (customItem.ConditionToRemoveItemFromDictionaryHere)
                {

                    var removeItem = Items.TryRemove(guid, out _);

                    // ...
                }
            }
        });
    }
}

Then, add the scheduler to your Application start:

    // Grab the Scheduler instance from the Factory
    NameValueCollection props = new NameValueCollection
    {
        { "quartz.serializer.type", "binary" }
    };
    var factory = new StdSchedulerFactory(props);
    var scheduler = await factory.GetScheduler();

    await scheduler.Start();
    var job = JobBuilder.Create<CleanExpiredJob>().Build();

    // Add your dictionary here
    // Notice: Keep the same name as the property in the job for injection
    job.JobDataMap["Items"] = Items;

    // Set the job to run every 10 seconds
    var trigger = TriggerBuilder.Create()
                    .StartNow()
                    .WithSimpleSchedule(x => x
                        .WithIntervalInSeconds(1)
                        .RepeatForever())
                    .Build();

    scheduler.ScheduleJob(job, trigger);
Omri Levi
  • 177
  • 5