Read a list of records and perform ongoing calculations about the previous records using Prolog

Question

This is a playground example inspired by a real task (much more complicated) that I once did. The basic flow is that there is the reading of records from a sequential file. Some of the records contain commands that require examining the previous records to calculate a value.

Undesirable about this solution is that it requires an extra list thus extra duplicate storage. That extra list is called REMEMBER in the following code. This example has a simple record structure just containing one data value so duplicating everything in the REMEMBER list is not a real issue. But please assume the real task involves a much more complicated record structure such that duplicating everything in the REMEMBER list is very undesirable.

I am inclined to use a doubly linked list however per discussion at this link Doubly Linked List in Prolog it seems that is not the Prolog way to do things. Therefore I am interested to see what the Prolog way of doing things should be.

/*
A file contains sequential records.
There are two types of record.
A data record provides a data value.
An average record provides a count and is a request for an average of the last count data values.
The parse dcg below parses a list from the data file.
The report dcg uses that list to generate a report.

After parse the list looks like this:

[value=5.9,value=4.7,value=7.5,average=3,value=9.0,value=1.1,value=8.3,average=5,value=7.1,value=1.3,value=6.7,value=9.9,value=0.5,value=0.3,value=1.5,value=0.2,average=7,value=2.2,value=7.8,value=2.5,value=4.5,value=2.4,value=9.7,average=4,value=5.2,value=8.5,value=2.2,value=8.0,value=0.7].

An example report looks like this:

[[count=3,total=18.1,average=6.033333333333333],[count=5,total=30.599999999999998,average=6.12],[count=7,total=20.400000000000002,average=2.9142857142857146],[count=4,total=19.1,average=4.775]].
*/

:- use_module(library(dcg/basics)).
:- use_module(library(readutil)).
:- use_module(library(clpfd)).
:- use_module(library(clpr)).

dospy
:-
spy(report),
spy(average),
leash(-all).

:- initialization main.

report(LIST)
-->
% the report starts with nothing to REMEMBER.
report(LIST,[]).

report([value=VALUE|LIST],REMEMBER)
-->
% value in the LIST goes into REMEMBER.
report(LIST,[value=VALUE|REMEMBER]).

report([average=COUNT|LIST],REMEMBER)
-->
% request for average in the LIST.
average(REMEMBER,COUNT),
report(LIST,REMEMBER).

report([],_REMEMBER)
% the LIST is empty so the report is done.
--> [].

average(REMEMBER,COUNT)
-->
% the average starts at 0 then accumulates for COUNT values from REMEMBER.
average(REMEMBER,COUNT,0,0.0).

average([value=VALUE|REMEMBER],COUNT,AT,TOTAL)
-->
% found needed value in the REMEMBER.
clpfd( AT #\= COUNT ),
clpfd( AT_NEXT #= AT + 1 ),
clpr( TOTAL_NEXT = TOTAL + VALUE ),
average(REMEMBER,COUNT,AT_NEXT,TOTAL_NEXT).

average(_REMEMBER,COUNT,COUNT,TOTAL)
-->
% all of the needed value have been seen so calculate and add to report. 
clpr( AVERAGE = TOTAL / COUNT ),
[[count=COUNT,total=TOTAL,average=AVERAGE]].

% now the part that does the parse of the data file.

parse(LIST) --> parse(data,LIST).
parse(LIST) --> parse(average,LIST).
parse(LIST) --> parse(end,LIST).

parse(data,[value=FLOAT|LIST])
-->
"data", whites, float(FLOAT), blanks, !,
parse(LIST).

parse(average,[average=COUNT|LIST])
-->
"average", whites, integer(COUNT), blanks, !,
parse(LIST).

parse(end,[])
-->
[].

clpr( CLPR )
-->
{ clpr:{ CLPR } }.

clpfd( CLPFD )
-->
{ CLPFD }.

main :-
system:( read_file_to_codes("doubly_motivation_data.txt",CODES,[]) ),
prolog:( phrase(parse(LIST),CODES) ),
system:( writeq(LIST),writeln(.) ),
prolog:( phrase(report(LIST),REPORT) ),
system:( writeq(REPORT),writeln(.) ).

/* doubly_motivation_data.txt
data 5.9
data 4.7
data 7.5
average 3
data 9.0
data 1.1
data 8.3
average 5
data 7.1
data 1.3
data 6.7
data 9.9
data 0.5
data 0.3
data 1.5
data 0.2
average 7
data 2.2
data 7.8
data 2.5
data 4.5
data 2.4
data 9.7
average 4
data 5.2
data 8.5
data 2.2
data 8.0
data 0.7
*/

Thanks for the edit @GuyCoder . I should have mentioned that I am using SWI-Prolog I think that the library(dcg/basics) is specific to SWI-Prolog. — S. Selfial, Jan 28 '19 at 17:49
Is there a reason you can't process the file backwards, from end to start? — Daniel Lyons, Jan 28 '19 at 18:29
Is using a [key-value pair](http://www.swi-prolog.org/pldoc/man?section=pairs) not allowed? Currently the way I see your problem is that you need to dynamically access the records and while Prolog likes to process list, there is no requirement that the data be in a list, thus I get your desire for doubly linked list. Also, I have never used key-value pairs with Prolog, but if this answer needs it, then I see no reason to avoid them. — Guy Coder, Jan 28 '19 at 18:41
Of interest: [dicts](http://www.swi-prolog.org/pldoc/man?section=dicts) - Never used this but has a [dicts_slice/3](http://www.swi-prolog.org/pldoc/doc_for?object=dicts_slice/3) which might work. — Guy Coder, Jan 28 '19 at 18:49
Of interest: [lists](http://www.swi-prolog.org/pldoc/man?section=lists) - Has [nth0/3](http://www.swi-prolog.org/pldoc/doc_for?object=nth0/3) which while not efficient, should work. — Guy Coder, Jan 28 '19 at 18:54
Since Paulo gave a Logtalk [answer](https://stackoverflow.com/a/54408859/1243762) that is very similar to an SWI-Prolog answer, I won't post it. The only real difference that I would make is to implement the Logtalk `take/3` with SWI-Prolog `nth0/3` which while inefficient should work. — Guy Coder, Jan 28 '19 at 19:47
Also of note is that you used constraints to do the math, but by using `{ }` with DCG, it is possible to use straight up Prolog with DCG and avoid the constraint libraries all together. — Guy Coder, Jan 28 '19 at 19:50
An interesting variation that should be be a lot faster than a straight up list would be to implement the data as a binary search tree using difference lis. The length of the list would have to be in the thousands and above to really notice a difference as anything less would be so fast with either way it wouldn't matter. — Guy Coder, Jan 28 '19 at 20:34
Of interest: [Difference Lists](http://users.utcluj.ro/~cameliav/lp/10_DifferenceLists.pdf) - Talks about binary tree as difference list. Gives some pictures, but most of the work is left as exercises. — Guy Coder, Jan 28 '19 at 20:48
Of interest: [Efficiency of Difference-List Programming](https://publishup.uni-potsdam.de/opus4-ubp/frontdoor/deliver/index/docId/3972/file/wlp09_S177_186.pdf) - More info on binary trees as difference list. — Guy Coder, Jan 28 '19 at 20:50
If you file is (say) 10,000 lines long, is there anything stopping the last line from being "average 10000" and forcing you to revisit all the data you've seen so far? If not, I don't think there's a force on earth that can save you from having to retain the whole content of the file throughout processing it. — Daniel Lyons, Jan 28 '19 at 23:48
@Daniel Lyons : Yes the intention was for it to be possible for a record that is "average request" to require perhaps all of the previous records that are "data values" . The desire expressed in the OP was not to save from having to keep all of the records in memory . The intent was to save from having to create a second data structure so that the average could be calculated . — S. Selfial, Jan 29 '19 at 02:39
Is the input a static file, or could it be dynamic like user input. If the input is static, then the file could be filtered into two files, one with just the data and one with the report locations. Then the new file would become the store where the index corresponds to the line in the file. I don't know of any Prolog library that can index into a text file and pull out a line. By using this, the data size could be much larger than available memory. — Guy Coder, Jan 29 '19 at 12:08
I'm trying to cook up a solution using the doubly-linked list from the other question but it may take me a few more hours (at work). — Daniel Lyons, Jan 29 '19 at 17:39
An inspiring question . Nice usage of the dcg . If You can get rid of the `!` , You are on Your way towards logical-purity . — Kintalken, Jan 30 '19 at 00:53
Also , a criticism , do not do for example `[average=3.1415]` . That is an inappropriate usage of the `=` symbol . `=` in Prolog is insistently consistent with mathematics . Also do not do this `[key1-3.1415,key2-0.11235813]` (sometimes seen in prolog code) , because an inappropriate use of `-` . With `-` , operands must have the same type , You cannot do this in math for example `12 apples - 4 oranges` . You can use json-like set notation `{key1:{3.1415},key2:{0.11235813}}` . Put `{` and `}` around values for consistency with mathematics and Knuth/Tangle . — Kintalken, Jan 30 '19 at 00:54

Paulo Moura · Answer 1 · 2019-03-20T10:47:22.907

Follows a Logtalk + SWI-Prolog solution, which doesn't require any materialization of double-linked lists. Only a stack, trivially implemented using a list is required:

------------ reports.lgt ------------
% load the required modules
:- use_module(library(dcg/basics), []).

% ensure desired interpretation of double-quoted text
:- set_prolog_flag(double_quotes, codes).

% optimize the generated code
:- set_logtalk_flag(optimize, on). 


:- object(reports).

    :- public(process/2).

    :- uses(list, [take/3]).
    :- uses(numberlist, [sum/2]).
    :- uses(reader, [file_to_codes/2]).

    :- use_module(dcg_basics, [blanks//0, whites//0, integer//1, float//1]).

    process(File, Report) :-
        file_to_codes(File, Codes),
        phrase(report(Report), Codes).

    report(Report) -->
        data(Value),
        report(Report, [Value]).

    report([], _) -->
        eos.
    report([Record| Records], Stack) -->
        average(Count),
        {compute_record(Count, Stack, Record)},
        report(Records, Stack).
    report(Records, Stack) -->
        data(Value),
        report(Records, [Value| Stack]).

    average(Count) -->
        "average", whites, integer(Count), blanks.

    data(Value) -->
        "data", whites, float(Value), blanks.

    compute_record(Count, Stack, r(Count,Total,Average)) :-
        take(Count, Stack, Values),
        sum(Values, Total),
        Average is Total / Count.

:- end_object.
-------------------------------------

Sample call using the data file in the question:

?- {library(types_loader), library(reader_loader), reports}.
...

?- reports::process('doubly_motivation_data.txt', Report).
Report = [r(3, 18.1, 6.033333333333334), r(5, 30.599999999999998, 6.119999999999999), r(7, 20.400000000000002, 2.9142857142857146), r(4, 19.1, 4.775)] .

As you noticed, I use a more sensible representation for the report than a list of lists. A bit more efficient solution can be coded by combining the take/3 and sum/2 calls into a custom predicate by avoiding the stack prefix with length Count being traversed twice and creating a temporary list of values. For example:

compute_record(Count, Stack, r(Count,Total,Average)) :-
    compute_record(0, Count, Stack, 0, Total),
    Average is Total / Count.

compute_record(Count, Count, _, Total, Total) :-
    !.
compute_record(Count0, Count, [Value| Stack], Total0, Total) :-
    Count1 is Count0 + 1,
    Total1 is Total0 + Value,
    compute_record(Count1, Count, Stack, Total1, Total).

From the sample data file, it seems that the file can end with a request to compute the average of all values in the file. Thus, the report//2 non-terminal must keep the whole stack until all the data file is processed.

@GuyCoder Note the `:- uses(list, [take/3]).` directive. There's no need of constraints to solve this problem. — Paulo Moura, Jan 28 '19 at 19:30
@GuyCoder `list` and `numberlist` are Logtalk library objects. You can consult their documentation at https://logtalk.org/library/index.html — Paulo Moura, Jan 28 '19 at 19:42
Of interest: Logtalk [take/3](https://logtalk.org/library/listp_0.html#listp-0-take-3) — Guy Coder, Jan 28 '19 at 19:44
Very nice code . But I note that it does not address the desire in the OP , which was to avoid the stack `REMEMBER` (just called `Stack` in your solution) , plus perhaps exacerbates the issue because using another temporary list `Values` for `take` and `sum` . Also , one criticism -> a mistake to use obscure name 'r' instead of 'record' . Somehow this example has coalesced this thought in my mind . It seems I should revisit logtalk . Not to get the instance based getter/setter capability that is my crutch . But rather because logtalk is (perhaps mostly) a great way to organize the code ? — S. Selfial, Jan 28 '19 at 22:10
@S.Selfial See the ending notes on my answer, which address the use of the stack and the temporary list. Also "instance based getter/setter capability" is an **imperative** pattern; Logtalk is **declarative**. — Paulo Moura, Jan 28 '19 at 22:19
@Paulo wrote See the ending notes on my answer" which were "A bit more efficient solution can be coded by combining the take/3 and sum/2 calls into a custom predicate" ... creating a custom predicate would not adress the issue , which is that the approach requires the creation of another temporary list structure . — S. Selfial, Jan 29 '19 at 07:24
@S.Selfial Update my answer with an alternative predicate to compute an individual record that combines the `take/3` and `sum/2` calls and avoids creating another temporary list. — Paulo Moura, Jan 29 '19 at 10:42

CapelliC · Answer 2 · 2019-01-29T01:00:57.417

I would try to exploit DCG semicontext, as explained on metalevel.at page. An OT example, I think is easy to grasp, is this answer (solving a zebra like puzzle in a DCG).

hint1 -->
  kind(brad, K), {dif(K, wheat)}, topping(brad, plain), size(walt, small).
hint2 -->
  size(P1, medium), size(P2, medium), {P1 \= P2},
  flavor(P1, hazelnut), topping(P2, peanut_butter).
...

Hints access the context sharing 'by magic':

kind(P, K) --> state([P, K, _, _, _]).
topping(P, T) --> state([P, _, T, _, _]).
...

the DCG must be called in this way, providing the relevant initial state:

bagels(Sol):- Sol =
    [[brad,_,_,_,_],
     [walt,_,_,_,_],
    ...],
  phrase((hint1, hint2, hint3, hint4, hint5, hint6), [state(Sol)], _).

Now, for your applicative case, this is near to useless (you already solved, just in a verbose way). For start, I don't get why you perform a 2 pass algorithm. Consider how concise could be the code, that yields the very same results what you posted (just displayed differently), in a single pass, making use of library(aggregate) to perform arithmetic. BTW, why clpfd, clpr can count as well... are you really interested in services from CLP for such a simple task?

cc_main :-
    %system:( read_file_to_codes("doubly_motivation_data.txt",CODES,[]) ),
    codes(CODES),
    tokenize_atom(CODES, Tks),
    phrase(cc_report([],Rep), Tks),
    maplist(writeln, Rep).

cc_report(_,[]) --> [].
cc_report(R,Re) -->
    [data,N],
    cc_report([N|R],Re).
cc_report(R,[ave(Ave)=sum(Sum)/C|Re]) -->
    [average,C],
    {aggregate_all((sum(V),count),(
         % no need for doubly linked lists, just peek from stack...
         nth1(I,R,V),I=<C
       ),(Sum,Count)),Ave is Sum/Count},
    cc_report(R,Re).

yields:

?- cc_main.
ave(6.033333333333334)=sum(18.1)/3
ave(6.119999999999999)=sum(30.599999999999998)/5
ave(2.9142857142857146)=sum(20.400000000000002)/7
ave(4.775)=sum(19.1)/4
true .

Anyway, swipl addons offer some useful material. See for instance edgc, an extension to handle many multiple accumulators while doing several visits of the concrete syntax tree of Acquarius Prolog compiler - developed back into 90s.

Thanks for the link to edgc. I often find my DCGs needing more than one threaded accumulator (implicit state). — Guy Coder, Jan 29 '19 at 12:04

score 1 · Accepted Answer · answered Jan 29 '19 at 23:39

So, for starters, I noticed that you can build results in either direction. In other words, the classic "int list" grammar is something like this:

intlist([]) --> [].
intlist([X|Xs]) --> integer(X), whites, intlist(Xs).

This works like so:

?- phrase(intlist(X), "1 23 45 9").
X = [1, 23, 45, 9] ;

But you can flip it around so it parses the list backwards like so:

rintlist([]) --> [].
rintlist([X|Xs]) --> rintlist(Xs), whites, integer(X).

This works, ish:

?- phrase(rintlist(X), "1 23 45 9").
X = [9, 45, 23, 1]

The problem with this is that putting the recursive call at the front, followed by something like "blanks" that can match empty lists is a recipe for a stack explosion. But, you can also parse things backwards by passing the "previous" state through the DCG itself:

rintlist(L) --> rintlist([], L).
rintlist(Prev, Prev) --> [].
rintlist(Prev, Last) --> integer(X), whites, rintlist([X|Prev], Last).

?- phrase(rintlist(X), "1 23 45 9").
X = [9, 45, 23, 1] .

Now, I think we can solve your problem nicely just from this; I wrote my solution and now see it is pretty similar to @PauloMoura's above, but here it is anyway:

commands(Report) --> record(data(V)), blanks, commands([V], _, Report).

commands(Prev, Prev, []) --> [].
commands(Prev, Last, Report) -->
    record(data(V)),
    blanks,
    commands([V|Prev], Last, Report).
commands(Prev, Last, [report(Count, Total, Avg)|Report]) -->
    record(average(N)),
    blanks,
    { calculate_average(N, Prev, Count, Total, Avg) },
    commands(Prev, Last, Report).

calculate_average(N, Prev, Count, Total, Avg) :-
    length(L, N),
    append(L, _, Prev),
    sumlist(L, Total),
    Avg is Total / N,
    Count = N.

This seems to give similar output to your example:

?- phrase_from_file(commands(C), 'mdata.txt'), write_canonical(C).
[report(3,18.1,6.033333333333334),
 report(5,30.599999999999998,6.119999999999999),
 report(7,20.400000000000002,2.9142857142857146),
 report(4,19.1,4.775)]

Now, expanding it to the doubly-linked list, let's first see what we would need to do to handle the "int list" grammar in a doubly-linked fashion. Much like this one, we have to pass a previous link forward into the recursive call, but making it a bit worse than this one, we need to fill in the "next" link in the previous variable we receive, with the current node. But because that link will be nil the first time, we have to have a bit of conditional logic to ignore that one. And I couldn't think of a sensible empty doubly-linked list, so I changed the base case to be [X] instead of []. So it gets a bit grotty.

% entry point (nil meaning there is no previous)
dlist(X) --> dlist(nil, X).

% base case: last integer
dlist(Prev, node(X, Prev, nil)) --> integer(X).
dlist(Prev, Last) -->
    integer(X), whites,
    {
     Prev = node(PV, PP, Cur)
     -> 
         Cur = node(X, node(PV, PP, Cur), _)
     ;
         Cur = node(X, Prev, _)
    },
    dlist(Cur, Last).

Note the self-reference in Cur = node(..., node(..., Cur), ...). This unification is what "ties the knot" as it were, between the previous link and this link. Let's try it:

?- phrase(dlist(L), "1 23 45 9").
L = node(9, _S2, nil), % where
    _S1 = node(1, nil, node(23, _S1, _S2)),
    _S2 = node(45, node(23, _S1, _S2), _71658)

A bit hard to read, but basically, 9 points to 45 points to 23 points to 1. We parsed it back-to-front and wound up with pointers in both directions.

What remains to be done at this point is to change the parser to emit records with these pointers instead, and write an averager that works this way. I couldn't quite get there doing the average in-place, so I wrote a helper to give me "up to N previous" from a doubly-linked list:

take_upto(N, DL, Result) :- take_upto(N, 0, DL, [], Result).
take_upto(N, N, _, Result, Result).
take_upto(_, _, nil, Result, Result).
take_upto(N, I, node(V, Prev, _), Rest, Result) :-
    I < N,
    succ(I, I1),
    take_upto(N, I1, Prev, [V|Rest], Result).

This works like so:

?- phrase(dlist(X), "1 2 3 4 5 6 7 8 9 10"), take_upto(5, X, L).
X = node(10, _S2, nil), % where
   ... [trimmed]
L = [6, 7, 8, 9, 10] .


?- phrase(dlist(X), "1 2 3 4 5 6 7"), take_upto(15, X, L).
X = node(7, _S2, nil), % where
   ... [trimmed]
L = [1, 2, 3, 4, 5, 6, 7] .

With this utility in place, we can finish this out:

commandsdl(Report) --> commandsdl(nil, _, Report).
commandsdl(Prev, Prev, []) --> [].
commandsdl(Prev, Last, Report) -->
    record(data(V)),
    blanks,
    {
     Prev = node(PV, PP, Cur)
     ->
         Cur = node(V, node(PV, PP, Cur), _)
     ;
         Cur = node(V, Prev, _)
    },
    commandsdl(Cur, Last, Report).
commandsdl(Prev, Last, [report(Count, Total, Avg)|Report]) -->
    record(average(N)),
    blanks,
    {
       calculate_average_dl(N, Prev, Count, Total, Avg)
    },
    commandsdl(Prev, Last, Report).

calculate_average_dl(N, Prev, Count, Total, Avg) :-
    take_upto(N, Prev, L),
    length(L, Count),
    sumlist(L, Total),
    Avg is Total / Count.

Overall, I'm pleased I was able to make this work, but in this formulation you really don't need the "next" pointers in your doubly-linked list, so I would be inclined to just go for the list implementation above (or perhaps Paulo's implementation if I were looking at Logtalk). Hopefully this illustrates how you could do this with doubly-linked lists, if your actual problem necessitates it despite your model not really needing it. Hope it helps!

Read a list of records and perform ongoing calculations about the previous records using Prolog

3 Answers3

Linked