4

I have a pretty large google Multimap<String,String> and was looking into ways to reduce the memory usage. In all of the examples I can find people are doing something like:

Multimaps.newSetMultimap(
TDecorators.wrap(new TIntObjectHashMap<Collection<Integer>>()),
new Supplier<Set<Integer>>() {
public Set<Integer> get() {
  return TDecorators.wrap(new TIntHashSet());
}
});

which works for a Multimap <Integer,Integer>, is it possible to use Trove to wrap a <String,String>?

Incase anyone is interested in the future I went with http://code.google.com/p/jdbm2/ to write the hash map to the filesystem.

Tombart
  • 30,520
  • 16
  • 123
  • 136
chrstahl89
  • 580
  • 10
  • 21
  • What is a *google Multimap< String,String >*? You mean Guava's `Multimap`? – Bruno Reis Mar 22 '13 at 20:18
  • TIntObjectHashMap appears to require int keys, but it doesn't look like there is any requirement for values to be Integers. Could you use a `Multimap`, keying on [`String.hashCode()`](http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#hashCode()), instead? – femtoRgon Mar 22 '13 at 20:26
  • Can you tell us more about your application? Immutable collections, including multimaps, are significantly more memory-efficient than mutable collections. Alternately, depending on the sorts of strings you have, it might e.g. be more efficient to store them in a UTF-8 `byte[]`. Other than those two suggestions, there's not likely to be any other options except maybe a database on disk. – Louis Wasserman Mar 22 '13 at 20:36
  • I'm storing in the Map a timestamp and then a list of keywords. Then I rely on Collections.frequency to count how many times each keyword is used per time period. I checked out the ImmutableMultimaps via your suggestion (thank you) so hopefully that will help some, hard to tell without going into work....and guava are "several of Google's core libraries" I don't really see a problem calling it googles. – chrstahl89 Mar 23 '13 at 02:16
  • You should maybe checkout Radix-tree. It's very similar to compressing. http://en.wikipedia.org/wiki/Radix_tree – claj Jan 20 '14 at 09:26

4 Answers4

6

Guava's Multimaps are backed by standard JDK Collections which aren't optimized for memory usage. For example, ArrayListMultimap<K, V> is backed by HashMap<K, ArrayList<V>> and HashMultimap<K, V> is backed by HashMap<K, HashSet<V>>.

Eclipse Collections (formerly GS Collections) has Multimaps backed by its own container types, UnifiedMap and UnifiedSet. UnifiedMap uses half the memory of HashMap and UnifiedSet uses a quarter the memory of HashSet. The memory benefits you'll see will depend on whether you use a FastListMultimap or a UnifiedSetMultimap.

More detailed memory comparisons are available here.

Note: I am a committer for Eclipse Collections.

Donald Raab
  • 6,458
  • 2
  • 36
  • 44
Craig P. Motlin
  • 26,452
  • 17
  • 99
  • 126
3

You could look at memory efficient variant of hash maps, such as this one: https://code.google.com/p/sparsehash/

If your value strings are long enough, compression could be an option. You could also look into disk backed solutions such as Ehcache, depending on your access statistics.

Enno Shioji
  • 26,542
  • 13
  • 70
  • 109
0

Trove4j doesn't contain hashmap for string-to-string.

See http://trove4j.sourceforge.net/javadocs/gnu/trove/map/hash/package-summary.html

anstarovoyt
  • 7,956
  • 2
  • 27
  • 35
0

An approach I use is to use Map<String,Collection<String>> where the values start out as ArrayList<String> and get promoted to HashSet<String> when the bucket hits some threshold, say 32 elements.

I have found this saves a lot of memory for small buckets.

Will
  • 73,905
  • 40
  • 169
  • 246