Sentence parsing and matching in Prolog

Question

I'm trying to create a sentence parser in Prolog. I want the sentence to be parsed into three separate lists that will be matched with a suggested outcome.

For example, here's the code I have come up with so far...

This is the vocabulary which will be used to parse certain words:

det(the).
det(a).
det(an).
noun(cat).
noun(boy).
noun(girl).
noun(grandfather).
noun(person).
noun(mat).
noun(party).
noun(book).
noun(problem).
noun(father).
noun(student).
noun(golf).
noun(conversation).
noun(challenge).
noun(person).
noun(problem).
noun(sat).
verb(thrives).
verb(loves).
verb(likes).
prep(on).
prep(with).
adj(big).
adj(lonely).
adj(elderly).
adj(teenage).
adj(good).
adj(fat).

sentence(Sentence,sentence(Noun_Phrase,Verb_Phrase)):-
      np(Sentence,Noun_Phrase,Rem),
      vp(Rem,Verb_Phrase).

Code for parsing:

np([X|T],np(det(X),NP2),Rem):- /* Det NP2 */
    det(X),
    np2(T,NP2,Rem).
np(Sentence,Parse,Rem):- np2(Sentence,Parse,Rem). /* NP2 */
np(Sentence,np(NP,PP),Rem):- /* NP PP */
    np(Sentence,NP,Rem1),
    pp(Rem1,PP,Rem).

np2([H|T],np2(noun(H)),T):- noun(H). /* Noun */
np2([H|T],np2(adj(H),Rest),Rem):- adj(H),np2(T,Rest,Rem).

pp([H|T],pp(prep(H),Parse),Rem):- /* PP NP */
    prep(H),
    np(T,Parse,Rem).

vp([H| []], vp(verb(H))):- /* Verb */
    verb(H).

vp([H|T], vp(verb(H), Rem)):- /* VP PP */
    vp(H, Rem),
    pp(T, Rem, _).

vp([H|T], vp(verb(H), Rem)):- /* Verb NP */
    verb(H),
    np(T, Rem, _).

I then want to get the output from this with something like this...

?- sentence([a,very,young,boy,loves,a,manual,problem],Parse).

And then somehow suggest a present out of a fact like this...

present(construction_kit, subject(very,young,boy), verb(loves), object(manual,problem)).
present(a_golfing_sweater, subject(young,father), verb(loves), object(golf)).

As you can probably see, I would want to match this with the 'construction_kit' present.

What Tomas is asking is why you wouldn't use DCG to implement your grammar. DCG makes it much more transparent. *E.g.*, `sentence --> noun_phrase, verb_phrase.` — lurker, Dec 07 '17 at 20:51
I'm just starting with Prolog and I'm not really knowledgeable on other methods. Any help would be appreciated. — Joseph, Dec 07 '17 at 20:53
See [this](https://stackoverflow.com/a/10401614/772868) how to write this with a DCG. (DCGs are the very reason why Prolog exists!) — false, Dec 07 '17 at 21:28
The problem here has to do with your whole approach, which won't scale, rather than your specific question. I would strongly suggest you check out Ivan Bratko's book _Prolog Programming for Artificial Intelligence_ or Pereira's book _Prolog and Natural Language Analysis_, which both go over satisfying more complex queries by parsing into lambda calculus expressions and then evaluating them. — Daniel Lyons, Dec 07 '17 at 22:38
Well this is clearly some sort of limited context. Matching sentences to presents. E-commerce site? DCG is probably as suitable as anything else that has been proposed (not limited to Bratko and P&S, which are 30+ years old). — Tomas By, Dec 07 '17 at 22:51
@TomasBy DCGs are where to start, but we have a huge solution space here and a very underspecified question. And, by the way, age of the book has nothing to do with quality or importance, especially in Prolog where so little has changed. — Daniel Lyons, Dec 07 '17 at 22:52
Not i Prolog, but in natural language processing things have happened. — Tomas By, Dec 07 '17 at 22:54
@joseph you need to explain what you are asking for. This code should be written using DCG, which you can find explained in the manual. I see no other obvious problems. If you have the sentence parts -> presents data, then Bob's your uncle! — Tomas By, Dec 07 '17 at 22:59
@TomasBy a lot has happened in statistical NLP, which may be what the OP needs, but it doesn't pertain to Prolog or this question and is way more out of scope than recommending a few older books. — Daniel Lyons, Dec 07 '17 at 23:04
I still don't think lambda calculus is going to be an improvement. — Tomas By, Dec 07 '17 at 23:06
@TomasBy Lambda calculus is one well-studied way to create an intermediate representation between natural language queries and their resolution. There are probably other ways, but as shown in my answer, without _some sort_ of abstraction between the query and the resolution the resolution step is going to grow in complexity indefinitely and still be brittle. — Daniel Lyons, Dec 07 '17 at 23:38
There is an intermediate representation in your suggestion (`Noun`, `Verb`, `Object`). I don't think you have showed that the complexity will grow indefinitely. (But I agree the OP should probably consider statistics.) — Tomas By, Dec 07 '17 at 23:42
@TomasBy My answer is built off a simplified problem. You don't really think _any_ English sentence can be adequately parsed into a 3-tuple of atoms, do you? And since not, someone is going to have to pay the piper—in this case, the suggestion engine. — Daniel Lyons, Dec 07 '17 at 23:57
I think that depends on the application. It's pretty clear the OP was not aiming to fully represent the meaning of any sentence. With an intermediate representation, like your triple, the problem is split in two (or three). Maybe the grammar is enough as is, but much more complex queries are needed. Or maybe the query representation is enough, but the ambition is to handle any possible utterance. Or maybe it works fine. — Tomas By, Dec 08 '17 at 00:10
@TomasBy Having reconsulted the books, I have come around to your point of view. Lambda calculus winds up being useful because functions are a natural fit for creating bindings and, during parsing, they're handy for transporting information around the I.R. But I think now you're right, they not a panacea and might be overkill for this problem. — Daniel Lyons, Dec 14 '17 at 07:07

Daniel Lyons · Answer 1 · 2017-12-13T22:29:55.770

Let's try and trim this down to a small problem so we can see how we might approach it. Let's start with a simpler grammar.

sentence(NP, VP) --> noun_phrase(NP), verb_phrase(VP).
noun_phrase(np(Noun, Adj)) --> det, adjectives(Adj), noun(Noun).

det --> [D], { det(D) }.
det --> [].

noun(N) --> [N], { noun(N) }.

adjectives([]) --> [].
adjectives([A|As]) --> adjective(A), adjectives(As).
adjective(A) --> [A], { adj(A) }.

verb_phrase(vp(Verb, Noun)) --> verb(Verb), noun_phrase(Noun).

verb(V) --> [V], { verb(V) }.

This is not a complete grammar even for your problem, but it will parse some sentences:

?- phrase(sentence(NP,VP), [a,teenage,boy,loves,a,big,problem]).
NP = np(boy, [teenage]),
VP = vp(loves, np(problem, [big]))

Now we can make a present database. I'm just going to consider the "head" of the noun phrase for now.

present('construction kit', boy, loves, problem).

Now you can deploy a very simple query to try and match them up:

?- phrase(sentence(np(Noun,_), vp(Verb, np(Object, _))),
   [a,teenage,boy,loves,a,big,problem]), 
   present(Suggestion, Noun, Verb, Object).
Noun = boy,
Verb = loves,
Object = problem,
Suggestion = 'construction kit' ;

Now you can see immediately your biggest problem: if you try to match a more complex grammar, you're going to have to have a corresponding increase in the complexity of your query which turns that into a suggestion. The solution to this kind of problem, where two things appear to increase in complexity in concert, is always to create an abstraction between them, but that is out of scope for this question.

Edit It's worth noting that if you want to build a general recommendation system, this is not a modern approach. I would recommend the book Programming Collective Intelligence for a quick overview of several approaches, backed by fairly readable Python.

Edit Is it clear that making the grammar more complex will result in more complexity in matching presents?

For starters, I have omitted entirely your prepositional phrase production. I just didn't include that. If it were there, would it have bearing on the suggestion? Consider a sentence like "a teenage boy who loves to code loves a big problem"—the natural extension of my noun_phrase//1 to cover this case will miss out on an important detail of this child which might lead you to a different present than the construction set.

Supposing you want to handle that supplemental information, you will have to add more patterns to the first argument to phrase/3. In fact, you'll probably accept whatever comes out and then dispatch to some kind of suggestion system.

One thing @TomasBy gets right is that we are using a very simple intermediate representation here, which is just (noun,verb,object). But if we make the grammar more complex, we will have to capture and handle more of it in this representation which will raise the complexity of the suggestion system. I don't see how to extend this simple representation to handle prepositional phrases, for instance, unless you just throw them away.

Converting the DCG back to ordinary Prolog

This is straightforward to do, but Prolog will do it for you. You can simply load the DCG above and use listing/0 to get the answer.

| ?- listing.

% file: user

sentence(A, B, C, D) :-
    noun_phrase(A, C, E),
    verb_phrase(B, E, D).

noun_phrase(np(A, B), C, D) :-
    det(C, E),
    adjectives(B, E, F),
    noun(A, F, D).

det([A|B], C) :-
    det(A),
    B = C.
det(A, A).

noun(A, [A|B], C) :-
    noun(A),
    B = C.

adjectives([], A, A).
adjectives([A|B], C, D) :-
    adjective(A, C, E),
    adjectives(B, E, D).

adjective(A, [A|B], C) :-
    adj(A),
    B = C.

verb_phrase(vp(A, B), C, D) :-
    verb(A, C, E),
    noun_phrase(B, E, D).

verb(A, [A|B], C) :-
    verb(A),
    B = C.

It is not true that more complex grammar must lead to more complex intermediate representation. The parser can produce multiple separate representations for one input, or it can just throw things away, which may be appropriate for e.g. adverbials. — Tomas By, Dec 08 '17 at 00:13
Is there any way you could change the DSG at the top to the format I was using? It's just I don't understand it at the moment. Ill get round to understanding it, but at the moment all my code is written in the format I asked the question with. Thanks. — Joseph, Dec 13 '17 at 22:18
Accept this answer and ask another question about it, and sure. — Daniel Lyons, Dec 13 '17 at 23:21

Sentence parsing and matching in Prolog

1 Answers1

Converting the DCG back to ordinary Prolog

Linked