1

This is a question about a map datastructure which stays memory efficient under heavy non-destructive manipulation.

Context

I am writing a little program whereby I generate "states" (which are just terms holding data) in a predicate iterate/2 according to some logic that can be disregarded for the purpose of the present discussion.

iterate/2 recursively invokes itself in a tail-recursive manner. At each invocation, iterate/2 generates more states and has to efficiently check whether any state has previously been seen.

The number of states "previously seen" can be considerable (a few tens of thousands). However, we are allowed to "forget about" (a fat percentage of) previously seen entries from time to time. This is a good thing as we can hope to keep overall memory requirements within bounds.

For efficient execution of a "previously seen" predicate, a "map" interface backed by an appropriate implementation seems to be a good choice (because one can implement lookup in maps efficiently).

Suppose we have such a map datastructure.

  • For any newly generated state, compute a hash value (by whatever means) from said state, and insert the pair (hash,state) as (key,value) into the map.
  • To check whether a state has already been previously seen, compute its hash value and check whether an entry corresponding to that hash value exists in the map. (For this exercise, we disregard the possibility of hash collisions)

The ever-different map datastructure is then passed between invocations of iterate.

This being a logic programming language, we want to sustain the general idea of working in the timeless mathematical space of all possible by-definition-immutable structures, where we move from one structure to the next like a monkey swinging from a vine, instead of continually mutating a time-varying data structure as we do in imperative programming. We just hope that the garbage collector is faster than we are!

The map-related predicates that we offer could be:

map_new(?Map)  

...to unify Map with the empty map.

map_get(+Map,+Key,?Val)  

...to unify Val with the value associated to key Key in map Map or fail if Map contains no such Key (by allowing Key to be a variable, one could implement enumeration via backtracking).

map_put(+Map,+Key,?Val,?OldVal,?NewMap)

...to unify map NewMap with the map obtained by inserting the (Key,Val) pair into map Map. If map Map already contains an entry with key Key, the value of said entry is unified with OldVal.

map_rem(+Map,+Key,?OldVal,?NewMap) 

...to unify map NewMap with the map obtained by removing the entry with key Key from map Map. The value associated to key Key in map Map is unified with OldVal. Fails if map Map does not contain key Key.

And we would use the above like this:

iterate(Solution) :-
   map_new(Map),
   iterate(Map,Solution).

iterate(M,S) :-
   gen_fresh_states(ListFresh),
   gen_stale_states(ListStale),
   add_fresh_states(M,ListFresh,MM),
   del_stale_states(MM,ListStale,MMM),
   ( termination_check(MMM,S) -> true ; iterate(MMM,S) ).

Where

 add_fresh_states(M,ListFresh,MM),
 del_stale_states(MM,ListStale,MMM),

iterate over the lists-of-states ListFresh and ListStale in the expected way, iteratively calling map_put/5 or map_rem/4.

And -> ; is the ->/2 predicate.

Problem

At this point, we need to choose a good implementation for the map. We do not want a structure that changes excessively whenever a single item is removed or added, so that the underlying implementation can keep copy operations and generally heap space low by referencing substructures (e.g. subtrees) of map M whenever MM is constructed from it during a map_put/5 or map_rem/4.

(For example, i hear that Clojure uses efficient implementations in that sense for some of its 4[immutable maps], but I haven't looked into this closely.)

In SWI Prolog, we have AVL-tree-backed implementation of assoc. This is excellent for fast lookup. Unfortunately, AVL trees may change heavily whenever a tree node is inserted or removed. There may well not be much in common between the actual on-heap layout of maps M, MM, MMM, or any map in between: The heap is likely to fill up rapidly. The implementation of assoc is in Prolog btw. (/usr/lib64/swipl-7.2.3/library/assoc.pl), there is no special sauce that goes into it.

Am I mistaken in thinking AVL tree-based maps won't be up to the task?

Alternatively, are there any map implementations (not necessarily in SWI Prolog) that are able to reuse substructure efficiently?

I suppose one can already alleviate heap usage by using "!" in between the subgoals of iterate/2. This will tell the runtime that backtracking will not occur, and thus gives it license to destroy any on-heap structures that won't ever be used again. For example, a "!" after the call to del_state_states/3 potentially makes MM eligible for garbage collection (I have no idea whether that really works though).

false
  • 10,264
  • 13
  • 101
  • 209
David Tonhofer
  • 14,559
  • 5
  • 55
  • 51

1 Answers1

2

Am I mistaken in thinking AVL tree-based maps won't be up to the task?

In all likelihood: Yes.

In my experience, library(assoc) is an excellent choice for many tasks that require access as you describe. I recommend you give it a try!

In practice, the logarithmic overhead is often very much acceptable, and you can also re-build the maps from scratch every so often to remove unneeded elements.

Note also that many other balanced tree structures (Red-Black trees etc.) also admit a natural and pure Prolog implementation, and if AVL trees turn out to not work well for you, I recommend you check out some of the pure alternatives.

Chris Okasaki's PhD thesis Purely Functional Data Structures is well worth a look in such situations.

mat
  • 40,498
  • 3
  • 51
  • 78
  • Ok, lets try that while keeping an eye on memory usage [as in "Prolog memory issues"](http://stackoverflow.com/questions/15811951/prolog-memory-issues). Has there ever been tooling for Prolog implementations (WAM-based or otherwise) as exists for the [Oracle JVM](http://docs.oracle.com/javase/8/docs/technotes/guides/visualvm/monitor_tab.html)? – David Tonhofer Dec 12 '16 at 22:54
  • 1
    Given that Prolog predates the JVM by a few decades, chances are high [that you find such tooling for Prolog too](http://eu.swi-prolog.org/profile.html). – mat Dec 12 '16 at 22:56
  • 1
    @David: Even more so, since some of the creators of the JVM had worked on Prolog before. – false Dec 13 '16 at 00:31
  • 1
    In fact, "Prolog" occurs 17 times in the [Java Virtual Machine Specification](https://docs.oracle.com/javase/specs/jvms/se7/jvms7.pdf), and Prolog code is prevalent in the document. See for example around page 159 and many following. – mat Dec 13 '16 at 00:42