13

I have some code that I added a nested dictionary to, of the following format

Dictionary<string, Dictionary<string, Dictionary<string, float>>>

After doing so I noticed the memory usage of my application shot up SIGNIFICANTLY. These dictionaries are keyed on strings that are often repeated, and there are many of these dictionaries, on the order of 10's of thousands.

In order to address this problem I hypothesized that the repeated strings were eating up a significant amount of memory. My solution was to hash the strings and use an integer instead (I would keep one copy of a rainbow table so I could reverse the hash when necessary)

Dictionary<int, Dictionary<int, Dictionary<int, float>>>

So I went to a memory profiler to see what kind of size reduction I could get. To my shock I actually found that the string storage was actually smaller in size (both normal and inclusive).

This doesn't make intuitive sense to me. Even if the compiler was smart enough to only store one copy of the string and use a reference, I would think that reference would be a pointer which is double the size of an int. I also didn't use any String.Intern methods so I don't know how this would have been accomplished (also is String.Intern the right method here?)

I'm very confused as to what's happening under the hood, any help would be appreciated

sedavidw
  • 11,116
  • 13
  • 61
  • 95
  • 3
    Nothing special is being done under the hood. Either your interning strategy did not remove a lot of redundancy or there's a bug. You need to post code. – usr Jun 01 '16 at 20:54
  • 3
    Maybe you can use a `Dictionary>` to get rid of the nesting. `Dictionary` is not a small type. Use a struct tuple. – usr Jun 01 '16 at 20:54
  • 2
    "shot up SIGNIFICANTLY." - compared to what? (Note that http://stackoverflow.com/questions/20430157/memory-usage-of-dictionary-in-c-sharp may be used as duplicate target for your question unless you clarify how it is not) – Alexei Levenkov Jun 01 '16 at 21:02
  • 1
    Using dictionaries of dictionaries is usually just a sign you should create a new type. How many dictionaries are there in total? – Anders Forsgren Jun 01 '16 at 21:03
  • @AlexeiLevenkov compared to when the dictionary didn't exist on the objects at all. I have a class (which I have many instances of) that I added this dictionary to. Adding this new object added GB of memory to my application – sedavidw Jun 01 '16 at 21:03
  • A correct answer could well depend on information you have not given, for example, how are you measuring memory consumption, what other things in your code could affect memory consumption unrelated to these dictionaries, how are you creating all these dictionaries (default size), how many items are in each of these dictionaries, was your int version *really* the same, etc. – hatchet - done with SOverflow Jun 01 '16 at 21:13
  • So how information that you expect to store in the dictionary was stored before? Simpy adding new object to class instantiated enough times would add extra memory... but that does not look like real comparison to claim "shot up" for me.. – Alexei Levenkov Jun 01 '16 at 21:20
  • Why are you surprised that converting from string to int made a smaller dictionary? If this is running in the 32-bit runtime, that's going to halve the storage used for the dictionaries' keys. And I don't know what your "rainbow table" is, but if it's some kind of lookup that converts your hash back to a string, then you have by definition implemented a kind of string interning to eliminate duplicate strings. I'm not at all surprised that what you did reduced memory consumption. – Jim Mischel Jun 01 '16 at 21:25
  • Is it also possible that the dictionaries are holding references to objects that were previously being garbage collected? Do you really have tens of thousands of _these dictionaries_ or tens of thousands of items _within the dictionary? – D Stanley Jun 01 '16 at 21:26
  • @hatchet I used the profiler on VS2015 to get a snapshot of the heap to determine memory by types. These are the only objects of these types (either string or int keyed) so I know that's where the data is coming from. Both dictionaries are the same in the number of keys at each level but the string version has representative keys and the int version has the GetHashCode of those strings (the floats are the same) – sedavidw Jun 01 '16 at 21:26
  • @DStanley I actually have 10's of thousands of these dictionaries, they exist on an object that I have approximately 50K instances of, each dict has 8keys at the first level, 2 at the second, and 10 on the third – sedavidw Jun 01 '16 at 21:28
  • @JimMischel "Why are you surprised that converting from string to int made a smaller dictionary?" -- The results were described as *opposite* to that -- "To my shock I actually found that the **string storage was actually smaller** in size". – Snixtor Jun 01 '16 at 22:07
  • @Snixtor Thanks, I misinterpreted his statement. The overhead is likely due to the table he uses to convert hash values back to strings. There are many things about the implementations that the OP isn't telling us. It's impossible to say where the memory is going without a full picture. – Jim Mischel Jun 01 '16 at 23:12

1 Answers1

23

If your keys and values are objects, there's approximately 20 bytes of overhead for each element of a dictionary, plus several more bytes per dictionary. This is in addition to the space consumed by the keys and values themselves. if you have value types as keys and values, then it's 12 bytes plus the space consumed by the key and value for each item in the dictionary. This is if the number of elements equals the internal dictionary capacity. But typically there is more capacity than elements, so there is wasted space.

The wasted space will generally be a higher relative percentage if you have lots of dictionaries with a small number of elements than if you had one dictionary with many elements. If I go by your comment, your dictionaries with 8 elements will have a capacity of 11, those with 2 elements will have a capacity of 3, and those with 10 will have a capacity of 11.

If I understand your nesting counts, then a single top level dictionary will represent 184 dictionary elements. But if we count unused capacity, it's closer to 200 as far as space consumption. 200 * 20 = 4000 bytes for each top level dictionary. How many of those do you have? You say 10's of thousands of them in thousand of objects. Every 10,000 is going to consume about 38 MB of dictionary overhead. Add to that the objects stored in the dictionary.

A possible explanation of why your attempt to make it smaller by managing the hash codes would be if there are not a lot of duplicated references to your keys. Replacing an object reference key with an int key doesn't change the dictionary overhead amount, and you're adding the storage of your new collection of hash codes.