Threading state/context in DCG while parsing text

Question

How to pass State (and change it when I need) around while parsing text !?

https://www.metalevel.at/prolog/dcg

the example is doing counting..

Don't know how I'm supposed to pass the initial State. Do I have to do it as call-parameter or as a first element in the list ? Whenever I want to use the State, do I have to include state(S) as a first goal ?

======

Let say i have to parse the string "add item 5".

If the state is "list" , let say it should print "5=>list"

if "set" then "5=>set"

or may be more complex string "new list. add 5" ... where parsing "new list" should set the State=list, so that parsing the next part is aware of the context and the interpretation is "add 5 to a list" vs "add 5 to a set".

No need to use those examples, I'm just trying to figure when to use semi-context notation and how to pass the context as a parameter in the first/top rule.

As you can see I'm confused, but this was the only example on the internet I could find and prolog docs as always are dense, terse and not very helpful ;( with no examples.

To clarify :

sent([X,Y],Ctx) -->  make(X,Ctx), add(Y,Ctx).
make(V,Ctx) --> [make,V], {say(">make ~w |ctx: ~w", [V,Ctx])}.
add(V,_Ctx) --> [add,V], {say(">add ~w", [V])}.

In this example I'm passing context at every level, even that I dont use it in add(). I can remove it for add(), but if there is subrule of I have to keep it.

I understood that using State threading need declaring Ctx only in the top rule and whenever it is used

score 5 · Answer 1 · answered Jan 17 '21 at 13:49

DCGs are an approach to parsing that gained some interest because of their tight and lighweight integration into their host language - Prolog, of course. So lightweight, indeed, that there is literally nothing specific to parsing embodied in a DCG clause, just the (fairly) efficient pattern matching that is made possible by difference lists.

If you use a DCG for parsing, you cannot thread the state among different clauses, because the difference list is used to match the tokens.

So, use the traditional way, manually passing the state around in arguments, or switch to extended DCG - for instance, pack(edcg).

"If you use a DCG for parsing, you cannot thread the state among different clauses, because the difference list is used to match the tokens." - I have never seen this statement anywhere in the online Prolog resources. I suspected this was the case but it's nice to see it written down so clearly. — matt eisenberg, Dec 24 '21 at 19:52

David Tonhofer · Answer 2 · 2021-01-17T14:29:53.267

Maybe this example helps. I will leave the semicontext trick out for now, as a discussion for later.

We want to count the occurences of a and b and c present in an atom. Any other characters are lumped into a dropped category (but also counted).

The "state" is thus the current counts for a,b,c,dropped. The state shall be implement using a map, in this case an SWI-Prolog dict.

Three ways of doing that:

With foldl/4
With phrase/3
Using vanilla Prolog

With the code below:

?-
count_with_foldl(abgdbabba,DictWithCounts).
DictWithCounts = counts{a:3,b:4,dropped:2}.

?- 
count_with_phrase(abgdbabba,DictWithCounts).
DictWithCounts = counts{a:3,b:4,dropped:2} ;
false.

?- 
count_with_recursion(abgdbabba,DictWithCounts).
DictWithCounts = counts{a:3,b:4,dropped:2}.

Code:

% ===
% Morph DictIn to DictOut so that:
% Only for Keys [a,b,c]:
% If Key exists in DictIn, DictOut is DictIn with the count for Key incremented
% If Key notexists in DictIn, DictOut is DictIn with a new entry Key with count 1
% ===

inc_for_key(Key,DictIn,DictOut) :-
   memberchk(Key,[a,b,c]),
   !,
   add_it(Key,DictIn,DictOut).
   
inc_for_key(Key,DictIn,DictOut) :-
   \+memberchk(Key,[a,b,c]),
   add_it(dropped,DictIn,DictOut).

add_it(Key,DictIn,DictOut) :-
   (get_dict(Key,DictIn,Count) -> succ(Count,CountNew) ; CountNew=1),
   put_dict(Key,DictIn,CountNew,DictOut).

% ===
% Using foldl to count
% ===

count_with_foldl(Atom,DictWithCounts) :-
   atom_chars(Atom,Chars),
   foldl(inc_for_key,Chars,counts{},DictWithCounts).

% ===
% Using a DCG to count
% ===

dcg_count(Dict,Dict)      --> [].
dcg_count(DictIn,DictOut) --> [C], { inc_for_key(C,DictIn,Dict2) }, dcg_count(Dict2,DictOut).

count_with_phrase(Atom,DictWithCounts) :-
   atom_chars(Atom,Chars),
   phrase(dcg_count(counts{},DictWithCounts),Chars).

% ===
% Using standard Prolog to count
% ===

count_with_recursion(Atom,DictWithCounts) :-
   atom_chars(Atom,Chars),
   count_rec(Chars,counts{},DictWithCounts).
   
count_rec([],Dict,Dict).
count_rec([C|Cs],DictIn,DictOut) :- inc_for_key(C,DictIn,Dict2), count_rec(Cs,Dict2,DictOut).

With plunit tests:

:- begin_tests(counting).

test("count using foldl/4, #1", true(DictWithCounts == counts{a:3,b:4,dropped:2})) :-
   count_with_foldl(abgdbabba,DictWithCounts).
   
test("count whith phrase/2, #1", [true(DictWithCounts == counts{a:3,b:4,dropped:2}),nondet]) :- 
   count_with_phrase(abgdbabba,DictWithCounts).

test("count whith recursion, #1", [true(DictWithCounts == counts{a:3,b:4,dropped:2})]) :- 
   count_with_recursion(abgdbabba,DictWithCounts).
   
:- end_tests(counting).

A single state variable in DCGs

In the above, the DCGs take an "input state" at argument position 1 which they develop into an "output state" which is eventually returned at argument position 2.

dcg_count(DictIn,DictOut)

This is completely analogous to functional programming whereby immutable structures are passed into functions and returned by functions. Here, the structures are "related" by the predicate, which is another way of looking at things (a way which sometimes bumps against realities of a machine that necessarily computes forward in time).

Note that above, the state variables are ground - no variables.

A single state variable is sufficient if that state variable is the name of an "empty cell" at the leaf of a larger structure. The larger structure "grows at its leaves", a DCG fills in the empty cell, but leaves another empty cell for the next call.

For example, here is a DCG which grows an "open list" at its end Fin. Fin is always an unbound variable that shall be filled in by the rule:

Replace tag by : in an atom:

collect(Fin) --> [],         { Fin = [] }.
collect(Fin) --> [t,a,g], !, collect(Fin2), { Fin = [':'|Fin2] }.
collect(Fin) --> [C],        collect(Fin2), { Fin = [C|Fin2] }.

collect(Atom,Collected) :-
   atom_chars(Atom,Chars),
   phrase(collect(Collected),Chars).

?- collect(xytagyx,Out).
Out = [x,y,:,y,x] ;
false.

The DCG code is traditionally written more compactly (this way also makes it amenable to tail-call optimization, which is not be disregarded):

collect([])        --> [].
collect([':'|Fin]) --> [t,a,g], !, collect(Fin).
collect([C|Fin])   --> [C],        collect(Fin).

collect(Atom,Collected) :-
   atom_chars(Atom,Chars),
   phrase(collect(Collected),Chars).

In this case, every DCG call only see the empty cell at the far end of the open list, whereas Collectecd denotes its start:

Collected ----> [|]        
               /   \       
              x    [|]     
                  /   \    
                 :   ~empty cell~ <--- Fin

So, when encountering y the rule

collect([C|Fin]) --> [C], collect(Fin).

just does its part on its Fin and grows the list by 1 listcell, but leaving it an "open list" for continuing appending:

Collected ----> [|]        
               /   \       
              x    [|]     
                  /   \    
                 :    [|]
                     /   \
            C ----> y   ~empty cell~ <--- Fin

The rule

collect(Fin) --> [], { Fin = [] }.

transforms the open list into a proper list, setting the empty cell to []:

Collected ----> [|]        
               /   \       
              x    [|]     
                  /   \    
                 :    [|]
                     /   \
                    y    []

This is of course exactly what happens in the tree example of the DCG Primer:

tree_nodes(nil) --> [].
tree_nodes(node(Name, Left, Right)) -->
        tree_nodes(Left),
        [Name],
        tree_nodes(Right).

Here, the datastructure which grows at its leaves is not a list, but a binary tree.

I see that you are using the state i.e. both dicts in the rule header declaration !!! I was thinking (i see the most use) of State-threading as not to declare the State variable at every rule , isnt this the basic idea ... the same way DCG hides the diff-lists i thought threading "hides" the state i.e. u declare it only when u need to refer to it or update it ! in this case it is worse cause u specify two arguments in every rule !! or my interpretation is mistaken — sten, Jan 17 '21 at 07:17
@sten I have added an explainer on the "single state variable" case. Does it help? — David Tonhofer, Jan 17 '21 at 10:50
@DavidTonhofer Test assertions should not use unification. If there's a bug in the code that results in failure to bound the `DictWithCounts` variable, your tests will still pass. — Paulo Moura, Jan 17 '21 at 14:22

Threading state/context in DCG while parsing text

2 Answers2