46

Scala has a TrieMap collection.

What is a TrieMap and what is its advantages/disadvantages compared to a HashMap?

Pinch
  • 2,768
  • 3
  • 28
  • 44
  • Or maybe: http://stackoverflow.com/questions/18660769/best-practices-for-mixing-in-scala-concurrent-map – Ben Reich Apr 07 '15 at 19:41
  • 3
    Definitely not a duplicate. Your biggest advantage is that Scala TrieMaps provide efficient, consistent iterators that capture all the elements in the trie at one point in time. – axel22 Aug 01 '16 at 13:20

2 Answers2

57

A Scala TrieMap is a trie-based concurrent scalable map implementation. Unlike normal trie maps, a Scala TrieMap has an efficient, non-blocking, O(1) time snapshot operation (and a slightly optimized readOnlySnapshot) operation.

Absolute performance of a TrieMap is slightly below the JDK8 ConcurrentHashMap, but the advantage is that it provides consistent iterators, something that concurrent data structures typically do not have. This means that you can capture all the elements in the trie at one point in time (performance numbers and analysis here). You should use the TrieMap if you need to capture all the elements at once (e.g. to list all its elements in the UI, or to consistently analyze them).

axel22
  • 32,045
  • 9
  • 125
  • 137
  • 1
    Slightly unrelated, but there is a open-source port of the scale TrieMap to Java available: https://github.com/romix/java-concurrent-hash-trie-map. – Pinch Aug 01 '16 at 22:04
  • @axel22 trying to get you attention here to explain how to use snapshot properly: https://stackoverflow.com/questions/66897678/semantics-of-readonlysnapshot Basically I am trying to understand how is ```O(1) time snapshot operation``` interpreted. – cpchung Apr 01 '21 at 17:46
  • The new snapshot (i.e. clone) of the `TrieMap` will be created in a constant amount of time (i.e. in the same amount of time independent of how many keys you stored). Later, when you try to access some of the keys in that snapshot or in the original `TrieMap`, the relevant parts of the `TrieMap` will be lazily duplicated. – axel22 Apr 04 '21 at 10:29
  • @axel22 ```the relevant parts of the TrieMap will be lazily duplicated. ``` Does that mean when the key is accessed from the clone/snapshot, it is no longer `O(1) get()` as in hashmap? What will be the run time complexity for this `lazy duplication`? the height of the underlying tree like O(lgN)? – cpchung Apr 11 '21 at 02:18
  • What will be the run time complexity for this `lazy duplication` before the actual `get()`? and what will be the run time complexity for the `get()` after the `lazy duplication`? – cpchung Apr 11 '21 at 02:27
  • 1
    The `get` is never `O(1)` in the standard Scala TrieMap - it's always expected `O(log n)`, but with a low constant factor (because the nodes are very wide). This is the case both with and without lazy duplication. The similar data structure that has expected `O(1)` operations is called cache-trie: https://dl.acm.org/doi/abs/10.1145/3178487.3178498 (but this is not implemented in the Scala stdlib, it's available on github as a standalone repo) If you added snapshots to cache-tries, the `get` before duplication would be `O(1)`, but after the duplication, it would briefly be `O(log n)`. – axel22 Apr 12 '21 at 15:16
23

TrieMaps are Maps using trie data structure which are essentially shallow trees. For example, If you have a 32 bit hash you break it up into sections for example 4 times 8 and at every level of the tree you branch to up to 256 sub trees. Obviously this gives O(1) performance due to the fixe size of hash(assuming few collisions).

A trie structure can be made immutable efficently, reusing the structure of a trie to create a new trie with an added or removed element. The relative performance in time/memory affect on GC depend greatly on implementation and load so rather then attempt a generic answer I would say run a benchmark. Though for a single thread with no immutability requirement a classic hashmap will normally produce better average performance and inferior worst case performance.

As a side note I will mention since the trieMap also uses a hash it is also a hashmap so I would recommend calling it a trie backed hashmap vs array backed hashmap.

Jason Webb
  • 7,938
  • 9
  • 40
  • 49
Meir Maor
  • 1,200
  • 8
  • 18