6

To quote the SICStus Prolog manual:

The usual mathematical theory behind Logic Programming forbids the creation of cyclic terms, dictating that an occurs-check should be done each time a variable is unified with a term. Unfortunately, an occurs-check would be so expensive as to render Prolog impractical as a programming language.

However, I ran these benchmarks (The Prolog ones) and saw fairly minor differences (less than 20%) in SWI Prolog between the occurs check (OC) being on and off:

OC is off: :- set_prolog_flag(occurs_check, false). in .swiplrc (restarted)

?- run_interleaved(10).
% 768,486,984 inferences, 91.483 CPU in 91.483 seconds (100% CPU, 8400298 Lips)
true.

?- run(1).
'Program'            Time     GC
================================
boyer               0.453  0.059
browse              0.395  0.000
chat_parser         0.693  0.000
crypt               0.481  0.000
fast_mu             0.628  0.000
flatten             0.584  0.000
meta_qsort          0.457  0.000
mu                  0.523  0.000
nreverse            0.406  0.000
poly_10             0.512  0.000
prover              0.625  0.000
qsort               0.574  0.000
queens_8            0.473  0.000
query               0.494  0.000
reducer             0.595  0.000
sendmore            0.619  0.000
simple_analyzer     0.620  0.000
tak                 0.486  0.000
zebra               0.529  0.000
           average  0.534  0.003
true.

OC is on: :- set_prolog_flag(occurs_check, true). in .swiplrc (restarted)

?- run_interleaved(10).
% 853,189,814 inferences, 105.545 CPU in 105.580 seconds (100% CPU, 8083669 Lips)
true.

?- run(1).
'Program'            Time     GC
================================
boyer               0.572  0.060
browse              0.618  0.000
chat_parser         0.753  0.000
crypt               0.480  0.000
fast_mu             0.684  0.000
flatten             0.767  0.000
meta_qsort          0.659  0.000
mu                  0.607  0.000
nreverse            0.547  0.000
poly_10             0.541  0.000
prover              0.705  0.000
qsort               0.660  0.000
queens_8            0.491  0.000
query               0.492  0.000
reducer             0.867  0.000
sendmore            0.629  0.000
simple_analyzer     0.757  0.000
tak                 0.550  0.000
zebra               0.663  0.000
           average  0.634  0.003
true.

Are these benchmarks not representative of real-world usage? (I remember reading somewhere that they were specifically chosen to be "fairly representative") Is SWI Prolog implementing some optimization tricks, that SICStus people aren't aware of, that make the cost of OC minor? If so, are they published? (I found a bunch of papers about this from the '90s, but I don't know if they are state-of-the-art)

MWB
  • 11,740
  • 6
  • 46
  • 91
  • 1
    While I do not have a direct answer to your question, we've had the pleasure of getting pretty technical with Jan Wielemaker (creator of SWI Prolog) over on the SWI-Prolog Discourse (https://swi-prolog.discourse.group/) :-) He's rather active so I can recommend reaching out there as well. – Phillip van Heerden Jan 06 '21 at 17:20
  • 1
    This is cross posted [here](https://swi-prolog.discourse.group/t/what-occurs-check-optimizations-is-swi-prolog-using/3461) – Guy Coder Jan 06 '21 at 19:33
  • @MostowskiCollapse My version is 7.6.4 (Linux). `current_prolog_flag(occurs_check, X)` agrees with whatever's in `~/.swiplrc`. – MWB Jan 09 '21 at 20:25
  • @MostowskiCollapse *"Try this: ?- freeze(A,true),freeze(B,true),A-B=s(A)-n.. It will succeed."* I see an error message in 7.6.4 (OC set to "error"). – MWB Jan 09 '21 at 20:48
  • @MostowskiCollapse That succeeds. I wouldn't get hung up on 15% vs 40% (I'm using a Xeon from 6 years ago also). 40% is still not the difference between "practical" and "impractical" language. – MWB Jan 09 '21 at 22:43
  • @oguz, why did you remove the tag [tag:occurs-check] in so many posts? That is a sensible tag. – false Feb 03 '21 at 09:11
  • @false It doesn't have an excerpt and wiki, and was created and added to ~10 questions by the same user. Besides it doesn't mean anything in isolation, it is only meaningful when used with the prolog tag. – oguz ismail Feb 03 '21 at 09:27
  • 1
    @oguzismail: What you are doing here is vandalizing a small tag. And no, it is not "only meaningful" to Prolog. It is related to [tag:unification] which appears in many areas. – false Feb 03 '21 at 09:33
  • @false Not vandalizing, rather getting rid of it. At least add a tag excerpt, so that no one else attempts to delete the tag again. – oguz ismail Feb 03 '21 at 09:36
  • Now I got a simple test case where occurs check goes +100% for SWI-Prolog, see revised answer. –  Feb 21 '21 at 15:29

2 Answers2

5

The major optimization makes unification of local variables a constant operation.

Many abstract machines like the PLM, ZIP, WAM, VAM provide a special case for logic variables that cannot be subterm of some structure (called local variables). Unification with such variables does not require any occurs-check at all and thus can be constant. In this manner large terms can be "passed back" without an extra occurs-check required.

Without this optimization, the handling of grammars (for parsing a given list) gets an overhead quadratic in the number of tokens. Every time the "input list" is handed back (so, graphically speaking, every time you are crossing a comma after a non-terminal in the grammar body), the remaining input list needs to be scanned for the occurrence of that local variable. (It's better than quadratic in the number of characters, since regular expressions are mostly encoded tail-recursively).

This optimization was introduced 2007 in 5.6.39. It is surprising that your measurements show overheads even in cases like tak, where not a single structure is built at all. As far as I can remember, occurs-check unification in SWI 5.6.39 ran a tiny bit faster than rational tree unification (for simple cases) as (at that time) no extra setup was needed.

There is still ample room for many further occurs-check optimizations. But those will only happen, if people do use this feature. As for SWI, not much happened in the last 13 years.

But think-of-it: The very first Prolog, called Prolog 0 did support the occurs-check by default. But from Prolog I ("Marseille Prolog") on, only excuses (such as those you cite) were made. And at least, the standard did not rule out occurs-check unification as default by only defining NSTO executions and requiring unify_with_occurs_check/2 and acyclic_term/1. And now, Prologs like SWI, Tau, and Scryer provide it optionally via a flag.

A further optimization into the same direction is Joachim Beer's NEW_UNBOUND tag which avoids additionally occurs-checks of some heap-variables at the expense of a more complex scheme. See The Occur-Check Problem Revisited. JLP 5(3) 1988. And LNCS 404.

false
  • 10,264
  • 13
  • 101
  • 209
  • 1
    That flag (later with its name shortened) was proposed for standardization by ISO back in **2009** in the context of WG17 work on the Core standard revision. Hopefully more Prolog systems will eventually implement it. – Paulo Moura Feb 03 '21 at 15:48
  • 1
    @PauloMoura: There is quite some reservation against a flag as it is pretty error prone if code that relies on occurs-check unification and code that relies on rational trees are mixed. One such example is CHR in SWI, which (in its current implementation since 10 years or so) relies on creating infinite trees and thus cannot be run together with that flag turned on. A possible (incomplete) solution might be `call_with_occurs_check/1`. – false Feb 03 '21 at 17:35
  • 1
    I'm aware of those issues. Coinduction is another example of code that breaks with such a flag turned on. But the implementation of the flag would/should allow to turn it on just for the computations that require the occurs check (for example, using a `setup_call_cleanup/3` goal). This would useful notably for testing. – Paulo Moura Feb 03 '21 at 17:47
  • 1
    Unfortunately `setup_call_cleanup/3` for this purpose is too coarse. One would need effectively to maintain that flag for each and every invocation. But that means to impose some general overhead. And, there is not a single implementation to show that it is not that difficult. – false Feb 03 '21 at 17:53
  • Is this "major optimization" separate from [the hashtable one](https://stackoverflow.com/questions/65655281/what-is-a-simple-worst-case-for-occurs-check-in-prolog#comment116102785_65655281)? Do modern Prologs use the latter? – MWB Feb 04 '21 at 06:22
  • 1
    @MaxB: It has nothing to do with those cases. And seriously, difference lists are a major and not a "major" point of Prolog. Those other bad cases for occurs-check which have been mentioned in various posts all rely on the tree vs graph representation of terms where a naive traversal gets an exponential overhead. In early LISP parlance, these are blam lists. There, hashtables **might** be of interest. But not for a simple traversal, there, marking is much cheaper. And in general, factoring (basically the same problem as minimization of finite automata) will always speed up the occurs-check. – false Feb 04 '21 at 08:54
  • SWI-Prolog is especially stupid with pattern matching: https://github.com/SWI-Prolog/swipl-devel/issues/789 –  Feb 22 '21 at 12:57
3

Here is a test case where the occurs check doubles the time to execute a query. Take this code here, to compute a negation normal form. Since the (=)/2 is at the end of the rule, the visited compounds becomes quadratic:

/* Variant 1 */
norm(A, R) :- var(A), !, R = pos(A).
norm((A+B), R) :- !, norm(A, C), norm(B, D), R = or(C,D).
norm((A*B), R) :- !, norm(A, C), norm(B, D), R = and(C,D).
Etc..

We can compare with this variant where the (=)/2 is done first while the compound is not yet instantiated:

/* Variant 2 */
norm(A, R) :- var(A), !, R = pos(A).
norm((A+B), R) :- !, R = or(C,D), norm(A, C), norm(B, D).
norm((A*B), R) :- !, R = and(C,D), norm(A, C), norm(B, D).
Etc..

Here are some measurements for SWI-Prolog 8.3.19. For variant 1 setting the occurs check flag to true doubles the time needed to convert some propositional formulas from the principia mathematica:

/* Variant 1 */
/* occurs_check=false */
?- time((between(1,1000,_),test,fail;true)).
% 3,646,000 inferences, 0.469 CPU in 0.468 seconds (100% CPU, 7778133 Lips)
true.

/* occurs_check=true */
?- time((between(1,1000,_),test,fail;true)).
% 6,547,000 inferences, 0.984 CPU in 0.983 seconds (100% CPU, 6650921 Lips)
true.

On the other hand, moving (=)/2 to the front changes the picture favorably:

/* Variant 2 */
/* occurs_check=false */
?- time((between(1,1000,_),test,fail;true)).
% 3,646,000 inferences, 0.453 CPU in 0.456 seconds (99% CPU, 8046345 Lips)
true.

/* occurs_check=true */
?- time((between(1,1000,_),test,fail;true)).
% 6,547,000 inferences, 0.703 CPU in 0.688 seconds (102% CPU, 9311289 Lips)
true.

Open Source (pages no longer available, nor on archive.org):

Negation Normal Form, Non-Tail Recursive
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-norm-pl

Negation Normal Form, Tail Recursive
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-norm2-pl

193 propositional logic test cases from Principia.
https://gist.github.com/jburse/7705ace654af0df6f4fdd12eee80aaec#file-principia-pl

false
  • 10,264
  • 13
  • 101
  • 209