2

I use a ConcurrentDictionary<String, String> to store a big amount of data (4 500 000 entries), and I dont want to use extra memory, so I fixed the capacity at the beginning. But the dictionary grow automatically before reaching the specified capacity.

I wrote a little portion of code to show the matter with only 500 items, I do reflection on the private buckets array because I didn't find a public property giving the real capacity:

using System;
using System.Collections.Concurrent;
using System.Reflection;

namespace MemoryUsage
{
    class Program
    {
        static void Main(string[] args)
        {
            CapacityTest();
        }
        private static void CapacityTest()
        {
            int capacity = 500;
            ConcurrentDictionary<String, String> dict = new ConcurrentDictionary<string, string>(Environment.ProcessorCount, capacity);
            Console.WriteLine("{0} buckets", GetBucketCount(dict));
            for (int index = 0; index < capacity; index++)
                dict.AddOrUpdate(Guid.NewGuid().ToString(), Guid.NewGuid().ToString(), (key, value) => value);
            Console.WriteLine("{0} buckets", GetBucketCount(dict));
            Console.ReadLine();
        }
        private static int GetBucketCount(ConcurrentDictionary<string, string> dict)
        {
            object tables = dict.GetType().GetField("m_tables", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(dict); // "_tables" with .NET Core, "m_tables" with .NET Framework
            object buckets = tables.GetType().GetField("m_buckets", BindingFlags.NonPublic | BindingFlags.Instance).GetValue(tables); // "_buckets" with .NET Core, "m_buckets" with .NET Framework
            return ((Array)buckets).Length;
        }
    }
}

Displays:

500 buckets at the beginning
1003 buckets at the end

I expected 500 buckets at the end. Do you know a way to avoid allocating extra memory since I know the number of items at the beginning?

cocosnake
  • 131
  • 2
  • 9
  • Possible duplicate? https://stackoverflow.com/questions/49303884/concurrentdictionary-with-limit – Robert Perry Jun 03 '19 at 13:34
  • I am not aware of this. I think you can use caching classes. https://learn.microsoft.com/en-us/dotnet/framework/performance/caching-in-net-framework-applications – SUNIL DHAPPADHULE Jun 03 '19 at 13:41
  • This seems to have to do with the distribution of the values. I tried your code with a set of integers (0-7) with processor count = 8. Then there are just 8 buckets. – Patrick Hofman Jun 03 '19 at 13:41
  • 2
    `ConcurrentDictionary` has to allow for some slack to allow lock-free operation with multiple threads. If you use `1` or `2` instead of `Environment.ProcessorCount`, you'll get the minimum buckets (on my system, at least). Obviously, this is not exactly something you can count on, as it also depends on your value distribution. – Jeroen Mostert Jun 03 '19 at 13:43
  • You're right @JeroenMostert, with 1 instead of `Environment.ProcessorCount`, the capacity is respected. But I guess it will downgrade the performance in multithreaded environment, I have to check. – cocosnake Jun 03 '19 at 13:59
  • If you know the initial capacity because you're going to initialize it from one single source, you can also try using the constructor overload that takes an `IEnumerable>` (you may need to write an iterator method for this). In my tests, this produces a dictionary with an initial bucket size larger than the actual capacity, but not twice as large. – Jeroen Mostert Jun 03 '19 at 14:03
  • Finally, I simply lowered the concurrencyLevel to 1 in production as a quick fix => no more OutOfMemoryException for a while. I didn't measure it, but it's probably slower. Now I have to choose between Dictionary wrapper with read/write lock mechanism or migrating huge operations to the database (probably the best choice). – cocosnake Jun 05 '19 at 11:57

2 Answers2

1

It's initial capacity not just capacity. So you can't limit it.

Don't reinvent the wheel, just use MemoryCache. It automatically deletes items in case of lack of memory. If you really want to control memory use MemoryCache.CacheMemoryLimit for instance.

mtkachenko
  • 5,389
  • 9
  • 38
  • 68
  • 1
    There may be some use cases where replacing a `ConcurrentDictionary` with a `MemoryCache` is appropriate, but certainly not all. In particular, we may actually need all of those keys all of the time! Also, `MemoryCache` is strictly worse in terms of memory use and concurrent performance (assuming the same capacity), as internally it wraps around a good old fashioned non-generic `Hashtable` with a good old fashioned lock. – Jeroen Mostert Jun 03 '19 at 14:14
  • @JeroenMostert I added wrong links to my answer. I mean this one https://www.nuget.org/packages/Microsoft.Extensions.Caching.Memory MemoryCache from this package uses `ConcurrentDictionary` internally - https://github.com/aspnet/caching/blob/master/src/Microsoft.Extensions.Caching.Memory/MemoryCache.cs – mtkachenko Jun 03 '19 at 14:27
  • Sure, but then that doesn't solve the original problem at all, as while you can specify a size limit for that, it runs into the same capacity problem (obviously, as it uses `ConcurrentDictionary` under the covers). That is, even if you know in advance you have X items, you can't make `MemoryCache` consume memory only for X items. This is true even if you use the size limit for this, as that only limits item count, not capacity. – Jeroen Mostert Jun 03 '19 at 14:33
0

This seems to have to do with the distribution of the values (for either optimization of the locking mechanism or characteristics of the used tree structure). I tried your code with a set of integers (0-7) with processor count = 8. Then there are just 8 buckets.

dict.AddOrUpdate(index, index, (key, value) => value);

But when multiplying the key by 2, there are 17 buckets after attempt 5 out of 8.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325