What is the most memory efficient method of storing a large number of Strings in a map?

Question

I want to store huge amounts of Strings in a Map<String, MagicObject>, so that the MagicObjects can be accessed quickly. There are so many entries to this Map that memory is becoming a bottleneck. Assuming the MagicObjects can't be optimized, what is the most efficient type of map I could use for this situation? I am currently using the following:

gnu.trove.map.hash.TCustomHashMap<byte[], MagicObject>

I would be amazed if another map would suddenly use way less memory, but I'm not that familiar with optimizing apps for memory usage. — Wesley De Keirsmaeker, Jun 15 '16 at 14:12
You don't change the JVM memory model by switching data structures. — duffymo, Jun 15 '16 at 14:17
@duffymo actually you can save memory depending on the type used: http://java-performance.info/memory-consumption-of-java-data-types-2/ (table at the end) — dognose, Jun 15 '16 at 14:26
Thank you, @dognose. My statement is still correct: You didn't change the JVM memory model. "Huge amounts of Strings" will still take up what they will. But I was unaware of this library. Is Trove open source or a purchased library? Do you have an association with it? (Full disclosure and transparency.) — duffymo, Jun 15 '16 at 14:29
Yes ofc. you can not optimize the Usage of a String - But the usage of the wrapping Datatype can be optimized (there are also differences between for example ArrayList and LinkedList - obviously). Dunno about Trove, Never used it. — dognose, Jun 15 '16 at 14:33
You don't even tell us what Map implementation you are using. HashMap is very efficient, and using String objects as the key of a HashMap is such a common use case that both HashMap and String.hashCode will have been implemented to give good performance together. So I doubt your assertion that your Map has poor performance. You have probably misinterpreted something. — Raedwald, Jun 15 '16 at 15:48
What about using a Database? I know that is not the thing you are asking for, but this kind of looks like you try to solve a symptom of a problem and not the root of the problem. — Ortwin Angermeier, Apr 04 '18 at 21:23
Database access is simply too slow. Imagine a near realtime application that is queried Hundreds of times per second, must always reply in less than 10ms and for each request has to look up between 50 and 10000 map entries. — Andreas Hartmann, Apr 05 '18 at 05:10

Leon · Answer 1 · 2018-04-05T09:19:32.543

4

If your keys are long enough and have a lot of long enough common prefixes then you can save memory by using a trie (prefix tree) data structure. Answers to this question point to a a couple of Java implementations of trie.

edited Apr 05 '18 at 09:19

answered Jun 15 '16 at 15:19

Leon

31,443
4
72
97

score 1 · Answer 2 · answered Jun 15 '16 at 15:34

1

To open mind, consider Huffman coding to compress your strings first before put in map, as long as your strings are fixed(number and content of string don't change).

answered Jun 15 '16 at 15:34

lulyon

6,707
7
32
49

score -1 · Answer 3 · answered Apr 04 '18 at 20:54

I'm a little late to this party but this question came up in a related search and piqued my interest. I don't usually answer Java questions.

There are so many entries to this Map that memory is becoming a bottleneck.

I doubt it.

For the storage of strings in memory to become a bottleneck you need an awfully large number of unique strings[1]. To put things into perspective, I recently worked with a 1.8m word dictionary (1.8m unique english words) and they took up around 1.6MB in RAM at runtime.

If you used every word in the dictionary as a key you'll still only use 1.6MB of RAM[2] to store the keys, hence memory cannot be your bottleneck.

What I suspect you are experiencing is the O(n^2) performance of string matching. By this I mean that as more keys are added performance slows down exponentially[3]. This is unavoidable if you are using strings are keys.

If you want to speed things up a bit, store each key into a hashtable that doesn't store duplicates and use the hash key as the key to your map.

NOTES:

[1] I'm assuming the strings are all unique or else you would not attempt to use them as a key into a map.

[2] Even if Java uses 2 bytes per character, it still only comes to 3.2MB of memory, total.

[3] It slows down even more if you choose the wrong data structure, such as an unbalanced binary tree, to store your values. I don't know how map stores values internally, but an unbalanced binary tree will have O(2^n) performance - pretty much the worst performance you can find.

Memory is becoming a bottleneck in the sense that the applications memory consumption is in the hundreds of gigabytes, most of which is associated with said map - we are indeed talking about many many millions of entries, though clearly the values of the map are also taking a fair share of the memory, it's not just the Strings. Regarding your suggestion, I did try that, but it later turned out that the Koloboke map implementation with a custom hash algorithm provided the best mix of runtime performance and memory consumption. — Andreas Hartmann, Apr 18 '18 at 13:57
Something is wrong about your sentence *I recently worked with a 1.8m word dictionary (1.8m unique english words) and they took up around 1.6MB in RAM at runtime*. Either the words were not loaded in RAM all at the same time, or they were packed in a data structure with some sort of compression. In any case uniquely referencing any element from a set with 1.8m items requires a handle at least 3 bytes in size, therefore if all those words were used as keys in a map the absolute minimum for memory usage would be at 5.4MB. — Leon, Feb 21 '19 at 14:12

What is the most memory efficient method of storing a large number of Strings in a map?

3 Answers3