How to write this DCG more elegantly?

Question

Playing around with DCGs and stubled upon the following problem:

I want to parse as well as produce an exact amount of spaces (" "). My trivial approach of simply doing this:

trivial_nat_space(0) --> [].
trivial_nat_space(N) -->
    { N > 0, N is M+1 },
    " ",
    trivial_nat_space(M).

failed terribly, because of insufficient instantiation of N and M depending on whether i do

?- String="   ", my_string_codes(String,Codes), phrase(trivial_nat_space(Anz_Spaces), Codes, [])

or

?- Anz_Spaces=3,my_string_codes(String,Codes), phrase(trivial_nat_space(Anz_Spaces), Codes, [])

where (for convenience)

my_string_codes(S,C) :-
    when((ground(S);ground(C)), string_codes(S,C)).

searching for a nice solution to the problem I made a version that depends on self defined nats:

z.
s(z).
s(s(O)) :-
  s(O).

nat_num(S,C) :-
    when((ground(S);ground(C)),nat_num_(S,C)).
nat_num_(z,0) :- !.
nat_num_(s(N),X) :-
    nonvar(X),
    !,
    X > 0,
    Y is X-1,
    nat_num_(N,Y).
nat_num_(s(N),X) :-
    var(X),
    nat_num_(N,Y),
    X is Y+1.

n_space(z) --> [].
n_space(s(N)) -->
    " ",
    n_space(N).

which I would like to avoid because the additional encoding of the natural number is kind of already present in the builtin numbers.

and this:

nat_space(0) --> [].
nat_space(N) -->
    { var(N) },
    " ",
    nat_space(M),
    { N is M+1 }.
nat_space(M) -->
    { nonvar(M), M>0 },
    " ",
    { N is M-1 },
    nat_space(N).

which does work fine:

?- Anz_Spaces=3,my_string_codes(String,Codes), once(phrase(nat_space(Anz_Spaces), Codes, [])).
Anz_Spaces = 3,
String = "   ",
Codes = [32, 32, 32].

?- String="   ",my_string_codes(String,Codes), once(phrase(nat_space(Anz_Spaces), Codes, [])).
String = "   ",
Codes = [32, 32, 32],
Anz_Spaces = 3.

However the encoding of nat_spaces is (in my opinion) far from nice: it depends on meta-predicates to enforce a specific execution sequence, and (more seriously): if the parser were more complex than just " ", the logic would have to be defined in a seperate DCG predicate/rule resulting in the logic for a single parser/generator to be split into two definitions (the separated one and the one enforcing the correct execution sequence).

Is this the canonical/standard way of solving problems like this or is there a more general, elegant solution that I am missing right now?

Additional Info:

I am using SWI-Prolog version 8.3.9 for x86_64-linux with :- [library(dcg/basics)] and no additional arguments when starting the runtime. Nor do I set any settings in the file with the definitions.

Having spent much time with doing DCGs in Prolog your question has a lot of open ended things that need to be clarified. 1. Which Prolog? 2. Which version? 3. What are all of the Prolog flag settings? 4. Which modules were loaded? — Guy Coder, Aug 01 '21 at 12:57
Why is `parser/generator to be split into two definitions` such a bad idea? I spent years in pursuit of unified parser/generator and while I do know of some nice tricks to help, e.g. [CLP](https://www.swi-prolog.org/pldoc/man?section=clp), in most cases it is not worth the time or effort to peruse. I do use single code bases for combined parser/generator in some cases but those are the exception. — Guy Coder, Aug 01 '21 at 12:59
I would guess that maintainability is better when the relevant functionality stays together as far as possible, but that might only be me; the explicit decision for specific execution paths is what triggers me more. for the other questions: I added the answers at the bottom of the question. — Fabian Schneider, Aug 01 '21 at 13:06
If you continue on this path then what happens when you get to harder problems that out of the box CLP can not solve. Then you need [CHR](https://www.swi-prolog.org/pldoc/man?section=chr) and your code becomes so advanced that only you understand it and can maintain it. — Guy Coder, Aug 01 '21 at 13:07
Now knowing you are using SWI-Prolog the [flag](https://www.swi-prolog.org/pldoc/man?section=flags#flag:double_quotes) that becomes critical in knowing is `double_quotes`. I take it that by not stating you are using the default. `?- current_prolog_flag(double_quotes,Value). Value = string.` — Guy Coder, Aug 01 '21 at 13:12
I don't plan on adding an answer to `Is this the canonical/standard way of solving problems like this or is there a more general, elegant solution that I am missing right now?` because I view that as a subjective question. You will get many varying answers and be pulled in so many directions you could go nuts and walk away from Prolog with a great disdain. Enough said. — Guy Coder, Aug 01 '21 at 13:14
I am using the default flags yes. It is a subjective question and what I am looking for is different more elegant ways to solve it because right now I am out of ideas. And a posted more elegant (in terms of the mentioned shortcomings) will be accepted as an answer; i could also ask this on reddit but the facilities for such questions on SO are nicer IMO — Fabian Schneider, Aug 01 '21 at 13:21
What your question is missing is context. You note `Playing around with DCGs` but what is your long term goal? That could help a lot in giving you a direction to head. For now I would say that you should focus on learning DCGs to just get either a parser or generator working and along the way you will learn things that will fill in the knowledge you need to answer your question about a unified parser/generator. In other words you picked a really hard problem to focus on, instead move past this and revisit it latter when you have more experience. — Guy Coder, Aug 01 '21 at 13:28
Of interest: [abnf.pl](https://github.com/wouterbeek/prolog_library_collection/blob/master/prolog/abnf.pl) by Wouter Beek - Note this uses difference list which will not work with most DCG example code or libraries such as dcg_basics. — Guy Coder, Aug 01 '21 at 13:36
Of interest: [regex](https://www.swi-prolog.org/pack/list?p=regex) - Regular expression support for Prolog by Michael Hendricks — Guy Coder, Aug 01 '21 at 13:38

false · Answer 1 · 2022-08-11T14:15:04.560

Frankly, your original definition doesn't fail that terribly. No, it does not fail. For the most general query, it produces one solution,

?- phrase(trivial_nat_space(0), Cs).
   Cs = []                                    % pure perfect logic!
;  false.
?- phrase(trivial_nat_space(-1), Cs).
   false.                                     % it's right to be false!
?- phrase(trivial_nat_space(1), Cs).
   error(instantiation_error,(is)/2).         % er...
?- phrase(trivial_nat_space(N), Cs).          % most general query
   Cs = [], N = 0                             % again, pure logic 
;  error(instantiation_error,(is)/2).         % mmh...

... and otherwise an instantiation error. Instantiation errors are not the worst that can happen. They clearly and honestly state that more information (= instantiations) must be provided before we can continue. That is much better than to pretend everything is fine when it is not. Think of a clerk who asks for more information as producing an instantiation error. And then compare this to one that just fills out your IRS forms with some bold default assumptions¹.

To localize the reason for an instantiation error, we will use a failure-slice. So I will throw in some false goals and also an additional instantiation to make it even easier:

trivial_nat_space(0) --> [], {false}.
trivial_nat_space(N) --> {N = 1},
    { N > 0, N is M+1, false },
    " ",
    trivial_nat_space(M).

?- phrase(trivial_nat_space(1), Cs).
   error(instantiation_error,(is)/2).

This is a pretty disfunctional program! And still it produces an instantiation error. In order to fix your original program we have to modify something in the remaining visible part. In N is M+1 only the M can cause that error. In fact, it occurs here for the first time. We can replace it by M is N-1 which improves your program:

?- phrase(trivial_nat_space(1), Cs).
   Cs = " "                                % see section double quotes
;  false.
?- phrase(trivial_nat_space(2), Cs).
   Cs = "  "
;  false.
?- phrase(trivial_nat_space(N), Cs).
   Cs = [], N = 0
;  error(instantiation_error,(is)/2).      % still ...
?- phrase(trivial_nat_space(N), " ").
   error(instantiation_error,(is)/2).

Our program now works at least when the concrete number of spaces is known. Even better, we can also use arithmetic expressions!

?- phrase(trivial_nat_space(4*1), Cs).    % let's indent by four spaces
   Cs = "    "
;  false.
?- phrase(trivial_nat_space(4*2), Cs).    % ... twice ...
   Cs = "        "
;  false.
?- phrase(trivial_nat_space(4*0), Cs).    % ... none?
   false.
?- phrase(trivial_nat_space(0), Cs).
   Cs = []                                % here it works
;  false.

So N may be an arithmetic integer expression, and it works as expected, except for 0 which must be stated literally. That is not really a deep problem, no algebraic properties are violated. But elegant it is not. Let's remember that.

Back to the instantiation errors. To handle these cases as well we need some way to deal with this variable N. The easiest way is to use library(clpz) or its predecessor in SWI library(clpfd) as proposed in another answer. And yes, you can do such things manually, but thereby you are duplicating the work that has been invested into that library. It might make sense for performance reasons sometimes, but it will come at a hefty (bug ridden) price.

So let's look at @gusbro's solution and don't forget to add

:- use_module(library(clpz)).  % SICStus or Scryer
:- use_module(library(clpfd)). % SWI

?- phrase(trivial_nat_space(N), Cs).
   Cs = [], N = 0
;  Cs = " ", N = 1                     % finally, logic!
;  Cs = "  ", N = 2
;  Cs = "   ", N = 3
;  ... .
?- phrase(trivial_nat_space(N), "  ").
   N = 2
;  false.
?- N = 1+1, phrase(trivial_nat_space(N), "  ").
   N = 1+1                               % nice like is/2
;  false.
?-          phrase(trivial_nat_space(N), "  "), N = 1+1.
   false.                                % and out, why?

Everything is nice and dandy, up to the last query. So that extension with arithmetic expressions did not work out so nicely. Effectively it boils down to the following problem:

?- N = 1+1, N #= 2.
   N = 1+1.
?-          N #= 2, N = 1+1.
   false.

In the first query, we solve the integer-equation 1+1 #= 2 which succeeds, and in the second query, we solve N #= 2 which succeeds with N = 2 and then we try to solve 2 = 1+1 which fails.

In other words, that extension into general arithmetic expressions did not work so well for constraints. Before, instantiation errors hid the problem. Now we need to draw somehow the line. And violating commutativity as above is not a nice option².

The solution is to separate expression variables and integer variables explicitly and insist on fully instantiated expressions.

?- N = 1+1, #N #= 2.
   error(type_error(integer,1+1),must_be/2)
?-          #N #= 2, N = 1+1.
   false.
?- assertz(clpz:monotonic).
   true.
?-          N #= 2, N = 1+1.
   error(instantiation_error,instantiation_error(unknown(_102),1)).

So now @gusbro's program gets some slight modification:

trivial_nat_space(0) --> [].
trivial_nat_space(N) -->
    { #N #> 0, #M #= #N-1 },
    " ",
    trivial_nat_space(M).

double_quotes

Since you want elegant code, consider to use as a single representation for text: lists of characters. In this manner you avoid all this converting code which will never be elegant. In some systems like Tau, Trealla, and Scryer, double quoted items are chars by default. In SWI proceed like so:

?- L = "ab", L = [_,_].
   false.

?- phrase("ab","ab").
   false.

?- set_prolog_flag(double_quotes, chars).
   true.

?- L = "ab", L = [_,_].
   L = [a, b].

?- phrase("ab","ab").
   true.

And with library(double_quotes)

?- L = "ab", L = [_,_].
   L = "ab".

Grammars

Finally, there is something to note about multi-directional grammar rules in general. Think of a predicate text_ast/2. For one Abstract Syntax Tree, there is an infinity of valid program texts which all differ by trivial paraphernalia like layout text. Therefore, this general relation must not terminate when only the AST is given. So you would need an extra parameter indicating whether the text should be canonical or not.

^{1 And in fact in DEC10 Prolog the default assumption for variables in arithmetic expressions was the value zero. ISO Prolog has defined instantiation errors for those situations.

2 In SICStus' native library clpfd, the same problem appears with ?- N = 1+1, call(N #= 2). instead.}

Even worse than ugly conversions, I was unable to separate my logic as cleanly with strings. I wanted to specify the DCG as `spaces//0` and separately constrain the length, but since strings can't be partially instantiated in SWI I was unable to do so without the `double_quotes` flag. — GeoffChurch, Feb 22 '22 at 00:44

score 3 · Accepted Answer · edited Aug 16 '21 at 08:22

3

For your specific example, you can use CLP(fd) to be able to use the DCG in both ways:

trivial_nat_space(0) --> [].
trivial_nat_space(N) -->
    { N #> 0, M #= N-1 },
    " ",
    trivial_nat_space(M).

In the following sample runs I will use backticks (`) to use coded strings:

?- phrase(trivial_nat_space(Anz_Spaces), `   `, []).
Anz_Spaces = 3 ;
false.

?- phrase(trivial_nat_space(3), Spaces, []).
Spaces = [32, 32, 32] ;
false.

?- phrase(trivial_nat_space(N), Spaces, []).
N = 0,
Spaces = [] ;
N = 1,
Spaces = [32] ;
N = 2,
Spaces = [32, 32] ;
N = 3,
Spaces = [32, 32, 32] 
 ...

edited Aug 16 '21 at 08:22

false

10,264
13
101
209

answered Aug 01 '21 at 18:45

gusbro

22,357
35
46

Yes someone other than the OP gave this an upvote. Nice to see that you used CLP(fd). – Guy Coder Aug 01 '21 at 19:29
1

One of the biggest problems I faced learning DCGs is that most examples don't say which Prolog, the Prolog version, which modules where loaded and the state of the flag settings. – Guy Coder Aug 01 '21 at 19:32
1

this is a very nice solution thank you; so using CLP(fd) solves the instantiation problem by propagating and solving the constraints instead of trying to unify the variables? also: the tip with the backticks is great, I did not know that – Fabian Schneider Aug 02 '21 at 05:25

score 2 · Answer 3 · answered Feb 22 '22 at 00:38

In this case we can avoid explicit arithmetic altogether, and let unification do the work:

:- set_prolog_flag(double_quotes, chars).

spaces --> "".
spaces --> " ", spaces.

n_spaces(N, Spaces) :-
    length(Spaces, N),
    phrase(spaces, Spaces).

?- n_spaces(N, S).
N = 0,
S = [] ;
N = 1,
S = [' '] ;
N = 2,
S = [' ', ' '] 

?- n_spaces(2, S).
S = [' ', ' '].

?- n_spaces(N, "  ").
N = 2.

We need the double_quotes flag here because (at least in SWI-Prolog 8.0.2) it seems that strings have to be either ground or var, unlike lists which can have variable entries/tails, so I don't think it's possible to use this unification technique with SWI-style strings:

?- string_length(String, Length).
ERROR: Arguments are not sufficiently instantiated
ERROR: In:
ERROR:    [8] string_length(_10886,_10888)
ERROR:    [7] <user>

How to write this DCG more elegantly?

3 Answers3

double_quotes

Grammars