I think there are two hard parts to this:
- a very efficient map data structure
- an efficient data structure for keeping track of what to recompute
The union-find data structure itself is already implemented in Prolog but hidden behind the somewhat obscure name of "variables". If you have a way of associating terms with variables representing their union-find equivalence classes, then:
- the
find
operation is the lookup of the class variable
- the test whether two classes are the same is
Class1 == Class2
- the
union
operation is Class1 = Class2
So find
is a bottleneck. In imperative languages the lookup is typically constant time, using something like foo.equivalence_class
or equivalence_classes[foo.id]
. In Prolog we don't have an equivalent (near-)constant-time mapping, in general. But since you only seem to be interested in variables, SWI-Prolog's attributed variables do fit the bill!
We can compute the union-find equivalence classes for a list of terms like this:
compute_classes([]).
compute_classes([Term | Terms]) :-
term_variables(Term, Variables),
variables_class(Variables, _NewEquivalenceClass),
compute_classes(Terms).
variables_class([], _EquivalenceClass).
variables_class([Var | Vars], EquivalenceClass) :-
( get_attr(Var, equivalence_class, ExistingEquivalenceClass)
-> ExistingEquivalenceClass = EquivalenceClass
; put_attr(Var, equivalence_class, EquivalenceClass) ),
variables_class(Vars, EquivalenceClass).
Using your example:
?- compute_classes([X+Y, Y+Z, T]).
put_attr(X, equivalence_class, _2772),
put_attr(Y, equivalence_class, _2772),
put_attr(Z, equivalence_class, _2772),
put_attr(T, equivalence_class, _2814).
We can kind of see that X
, Y
, and Z
all share an equivalence class, and T
is in a separate one.
Some utilities:
var_class(Var, Class) :-
get_attr(Var, equivalence_class, Class).
var_var_sameclass(Var1, Var2) :-
var_class(Var1, Class1),
var_class(Var2, Class2),
Class1 == Class2.
var_var_union(Var1, Var2) :-
var_class(Var1, Class1),
var_class(Var2, Class2),
Class1 = Class2.
Continuing the example:
?- compute_classes([X+Y, Y+Z, T]), var_class(X, ClassX), var_class(Y, ClassY), var_class(T, ClassT).
ClassX = ClassY,
put_attr(X, equivalence_class, ClassY),
put_attr(Y, equivalence_class, ClassY),
put_attr(Z, equivalence_class, ClassY),
put_attr(T, equivalence_class, ClassT).
?- compute_classes([X+Y, Y+Z, T]), var_var_sameclass(X, Y).
put_attr(X, equivalence_class, _3436),
put_attr(Y, equivalence_class, _3436),
put_attr(Z, equivalence_class, _3436),
put_attr(T, equivalence_class, _3478).
?- compute_classes([X+Y, Y+Z, T]), var_var_sameclass(X, T).
false.
?- compute_classes([X+Y, Y+Z, T]), var_var_union(Z, T), var_var_sameclass(X, T).
put_attr(X, equivalence_class, _3502),
put_attr(Y, equivalence_class, _3502),
put_attr(Z, equivalence_class, _3502),
put_attr(T, equivalence_class, _3502).
That is, X
and Y
really are in the same class, while X
and T
are not. If we union the classes for Z
and T
, then suddently X
and T
are in the same class.
Killing variables is where it gets more tedious. The idea here (as you suggested in the question) is to only recompute "affected" parts of the input. I think this can be done by associating a set of affected terms with each equivalence class. I will use lists here, but I do not suggest that lists are a good choice in practice.
Computing equivalence classes for a list of terms, and a "watchlist" for each equivalence class:
compute_classes_and_watchlists(Terms) :-
compute_classes(Terms),
maplist(compute_watchlist, Terms).
compute_watchlist(Term) :-
term_variables(Term, [RepresentativeVariable | _OtherVars]),
var_class(RepresentativeVariable, Class),
( get_attr(Class, class_watchlist, Watchlist)
-> true
; Watchlist = [] ),
put_attr(Class, class_watchlist, [Term | Watchlist]).
For example:
?- compute_classes_and_watchlists([X+Y, Y+Z, T]).
put_attr(X, equivalence_class, _2932),
put_attr(_2932, class_watchlist, [Y+Z, X+Y]),
put_attr(Y, equivalence_class, _2932),
put_attr(Z, equivalence_class, _2932),
put_attr(T, equivalence_class, _3012),
put_attr(_3012, class_watchlist, [T]).
So if you were to kill any of X
, Y
, or Z
, the watchlist for their class _2932
would tell you that you would need to recompute equivalence classes for the terms Y+Z
and X+Y
(but nothing else).
Killing itself grabs the killed variable's class and its watchlist (which it "returns") and clears the equivalence class for each variable in that class:
kill_var(Var, TermsToRecompute) :-
var_class(Var, Class),
get_attr(Class, class_watchlist, TermsToRecompute),
del_attr(Class, class_watchlist),
maplist(clear_class, TermsToRecompute).
clear_class(Term) :-
term_variables(Term, [RepresentativeVariable | _OtherVars]),
del_attr(RepresentativeVariable, equivalence_class).
Killing only makes sense if you immediately (a) bind the killed variable to a ground term, and (b) recompute the equivalence classes for the affected terms. In your example:
?- compute_classes_and_watchlists([X+Y, Y+Z, T]), kill_var(Y, TermsToRecompute), Y = y_is_now_bound, compute_classes_and_watchlists(TermsToRecompute).
Y = y_is_now_bound,
TermsToRecompute = [y_is_now_bound+Z, X+y_is_now_bound],
put_attr(X, equivalence_class, _4640),
put_attr(_4640, class_watchlist, [X+y_is_now_bound]),
put_attr(Z, equivalence_class, _4674),
put_attr(_4674, class_watchlist, [y_is_now_bound+Z]),
put_attr(T, equivalence_class, _4708),
put_attr(_4708, class_watchlist, [T]).
This is getting hard to read, but the point is that after killing and binding Y
and recomputing the union-find structure over the affected terms, X
and Z
are now in separate equivalence classes. T
is just sitting around unaffected.
All of this kind of assumes that you won't backtrack across a union, although my reading of SWI's attributed var docs is that this wouldn't even do anything terribly wrong. Since put_attr
is backtrackable, with some more care this might turn out to be a fairly flexible implementation in which backtracking simply splits classes apart again. Backtracking across a kill could also be made to work, I think.
TODOs:
- in the watchlist case, the definition of
var_var_union
must be changed to merge the watchlists for the two classes (if they are distinct); for actual lists this would be append
, but some actual set or more specialized data structure would be better, especially if you expect to have some sort of "stack-like" behavior where the next variable to be killed is most likely one on which you did a recent union
operation
- especially with the watchlist approach, users must be prevented from accidentally unifying
equivalence_class
terms; this could be done by using some sort of class(<unique_id>, NakedClassVariable)
instead of just the naked variable to represent an equivalence class
- the repeated calls to
term_variables
could probably be be optimized somehow -- in addition to a watchlist of terms, you could also keep a watchlist of the variables of interest
All in all, this is not quite production ready code, but it might give you some ideas.