16

In a program I'm working on, I develop a large "thread tree" (at most k children per node), where each thread makes some modifications to a hash table inherited from its parent. Is there a way to implement a hash table that is somewhat "persistent" (in the sense of http://en.wikipedia.org/wiki/Persistent_data_structure) ?

Namely, is there a way to implement a key-value pairing with at least O(log n) lookup, insertion, and deletion that is fully persistent, but is as "space-efficient" (worst-case) as an ordinary hash-table?

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
ManRow
  • 161
  • 1
  • 4
  • 1
    In that wikipedia article there are atleast 5 different kinds of persistent, what kind of persistence are you looking for? – Roy T. Feb 01 '11 at 08:49
  • Full persistence, just updated the original post – ManRow Feb 01 '11 at 09:03
  • If I understood your problem well, if you want each child, to inherit the hash table from his parent, you could just make a deep copy of the hash table everytime a child is created/added. –  Feb 01 '11 at 09:04
  • @Muggen: that's an O(n) operation for every childbirth. A persistent data structure takes this down to O(1) at the expense of O(lg n) for the lookup, insert and delete operations. – Fred Foo Feb 01 '11 at 11:21
  • Are you in Java, or C++? – Puppy May 30 '11 at 15:28

4 Answers4

5

"As space-efficient as an ordinary hash-table" is a rather vague specification, since "ordinary" may mean linked or probed depending on who you ask. I don't think anyone has designed easy-to-understand persistent hash tables yet.

The easiest way to get a persistent key-value map with the complexity that you want is to use a persistent binary search tree. Lookup is the familiar algorithm from ephemeral (non-persistent) BSTs. Insert changes however, and becomes something like (pseudo-Java):

// overwrites the value for k if it's already in the tree
Node insert(Node node, Key k, Value v)
{
    if (k < node.key)
        return new Node(node.key, node.val, insert(node.left, k, v), node.right);
    else if (k > node.key)
        return new Node(node.key, node.val, node.left, insert(node.right, k, v));
    else
        return new Node(k, v, node.left, node.right);
}

Note that the insert routine returns a new tree, which may seem inefficient, but it only changes those nodes it traverses. This is on average O(lg n), so it does O(lg n) allocations on average. This is about as space-efficient as it gets.

To get worst-case O(lg n) behavior, use a red-black tree. See also the literature on data structures in functional programming, e.g. the works of Okasaki.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 1
    +1 for mentioning Okasaki's work; if you can track down a copy of his book on purely functional data structures you'll have access to all sorts of fun solutions to this problem. – templatetypedef Feb 01 '11 at 17:59
2

Is there a way to implement a key-value pairing with at least O(log n) lookup, insertion, and deletion that is fully persistent, but is as "space-efficient" (worst-case) as an ordinary hash-table?

Indeed there is. Many ways.

E.g. in Haskell, the simple Data.Map, a size balanced binary trees (or trees of bounded balance) as described by:

  • Stephen Adams, "Efficient sets: a balancing act", Journal of Functional Programming 3(4):553-562, October 1993, http://www.swiss.ai.mit.edu/~adams/BB/.
  • J. Nievergelt and E.M. Reingold, "Binary search trees of bounded balance", SIAM journal of computing 2(1), March 1973.

Provides the following API, satisfying your criteria:

insert :: Ord k => k -> a -> Map k a -> Map k a   -- O(log n)
lookup :: Ord k => k -> Map k a -> Maybe a        -- O(log n)
delete :: Ord k => k -> Map k a -> Map k a        -- O(log n)

while being fully-persistant. Space use is O(n).

For better constant factors, try e.g. a Data.HashMap data structures, with the same overall complexity.

Alternative structures include:

  • persistent tries, which have improved space use over hashtables, as key storage is dense.
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • Since insert and delete work using path copying, I thought each insert can cost Omega(lg n) space, so that maintaining k versions can cost Omega(n + k lg n) space. – jbapple Oct 15 '11 at 21:10
1

Clojure has implemented a whole set of persistent data structures, such as hash maps. It's open source, so maybe you should take a look?

http://clojure.org/data_structures

http://code.google.com/p/clojure/source/browse/trunk/src/jvm/clojure/lang/PersistentHashMap.java

Martin Probst
  • 9,497
  • 6
  • 31
  • 33
1

Namely, is there a way to implement a key-value pairing with at least O(log n) lookup, insertion, and deletion that is fully persistent, but is as "space-efficient" (worst-case) as an ordinary hash-table?

Yes. Section 5 of Driscoll et al.'s "Making Data Structures Persistent" shows a technique for making fully persistent red-black trees with O(lg n) time and O(1) space complexity for insertion, deletion, and lookup.

Their data structure is not confluently persistent. For more on persistence, see Kaplan's survey on the topic.

jbapple
  • 3,297
  • 1
  • 24
  • 38
  • 1
    This paper is fine for theoretical considerations, but is there some actual C code out there that implements persistent red black trees? Seems like all the libraries I find (e.g. GNU libavl) are *not* persistent, and understanding red black trees is hard enough without tacking on persistence... – Michael Oct 22 '14 at 17:41
  • Use Google with the keywords: persistent red-black driscoll – jbapple Oct 23 '14 at 00:37