word processing prolog

Question

I am trying to break a word into different syllables in Prolog according to 2 different rules ..

rule 1: vowel-consonant-vowel (break word after second vowel)
rule 2: vowel-consonant-consonant-vowel (break word between the 2 consonant) , for example, calculator = cal-cula-tor ..

I already have the following code in Prolog, however, it only analyzes the first 3 or 4 letters of the word ..

I need it to process and analyze the entire word.

    vowel(a).
    vowel(e).
    vowel(i).
    vowel(o).
    vowel(u).


    consonant(L):- not(vowel(L)).

    syllable(W, S, RW):- 
        atom_chars(W, [V1, C, V2|Tail]), 
        vowel(V1), 
        consonant(C), 
        vowel(V2), 
        !, 
        atomic_list_concat([V1, C, V2], S), 
        atomic_list_concat(Tail, RW).

    syllable(W, S, RW):- 
        atom_chars(W, [V1, C, C2, V2|Tail]), 
        vowel(V1), 
        consonant(C), 
        consonant(C2),
        vowel(V2), 
        !, 
        atomic_list_concat([V1, C, C2, V2], S), 
        atomic_list_concat(Tail, RW).

    syllable(W, W, _).

    break(W, B):- 
        syllable(W, B, ''), !.

    break(W, B):- 
        syllable(W, S, RW), 
        break(RW, B2), 
        atomic_list_concat([S, '-', B2], B).

I just noticed you have it spelled correctly in the code but not in the text... this is weird — , Oct 28 '16 at 21:27
@Boris You are noticing things that are pretty irrelevant .. consonant or constant, they both mean the same thing .. a letter that is not a vowel .. i have no idea why my spelling is bothering you so much .. — , Oct 28 '16 at 21:40
Because a consonant **is not** a constant. They both mean something **different** from each other, and esp. "constant" is a very common term in computer science. As for the code vs. text, it probably means you copied the code from someone without attribution ("i already have code", hmm), which is plagiarism, which is not nice. — , Oct 28 '16 at 21:46
Basically, for all I know, you are using Stackoverflow as a free homework-writing service, with some success. I cannot do anything about it except tell you that I notice. — , Oct 28 '16 at 21:57
Here the question and answer where the code comes from: http://stackoverflow.com/questions/40163849/breaking-words-into-syllables-in-prolog now at least it is listed as "linked" and others can see it, too — , Oct 28 '16 at 22:23

score 3 · Answer 1 · answered Oct 28 '16 at 16:56

First, a setting that makes it much more convenient to specify lists of characters, and which I recommend you use in your code if you process text a lot:

:- set_prolog_flag(double_quotes, chars).

Second, the data, represented in such a way that the definitions can be used in all directions:

vowel(a). vowel(e). vowel(i). vowel(o). vowel(u).

consonant(C) :- maplist(dif(C), [a,e,i,o,u]).

For example:

?- consonant(C).
dif(C, u),
dif(C, o),
dif(C, i),
dif(C, e),
dif(C, a).

whereas the version you posted incorrectly says that there is no consonant:

?- consonant(C).
false.

The rules you outline are readily described in Prolog:

% rule 1: vowel-consonant-vowel (break after second vowel)
rule([V1,C,V2|Rest], Bs0, Bs, Rest) :-
        vowel(V1), consonant(C), vowel(V2),
        reverse([V2,C,V1|Bs0], Bs).

% rule 2: vowel-consonant-consonant-vowel (break between the consonants)
rule([V1,C1,C2,V2|Rest], Bs0, Bs, [C2,V2|Rest]) :-
        vowel(V1), consonant(C1), consonant(C2), vowel(V2),
        reverse([C1,V1|Bs0], Bs).

% alternative: no break at this position
rule([L|Ls], Bs0, Bs, Rest) :-
        rule(Ls, [L|Bs0], Bs, Rest).

Exercise: Why am I writing [V2,C,V1|_] instead of [V1,C,V2|...] in the call of reverse/2?

Now, it only remains to describe the list of resulting syllables. This is easy with dcg notation:

word_breaks([]) --> [].
word_breaks([L|Ls]) --> [Bs],
        { rule([L|Ls], [], Bs, Rest) },
        word_breaks(Rest).
word_breaks([L|Ls]) --> [[L|Ls]].

Now the point: Since this program is completely pure and does not incorrectly commit prematurely, we can use it to show that there are also other admissible hyphenations:

?- phrase(word_breaks("calculator"), Hs).
Hs = [[c, a, l], [c, u, l, a], [t, o, r]] ;
Hs = [[c, a, l], [c, u, l, a, t, o], [r]] ;
Hs = [[c, a, l], [c, u, l, a, t, o, r]] ;
Hs = [[c, a, l, c, u, l, a], [t, o, r]] ;
Hs = [[c, a, l, c, u, l, a, t, o], [r]] ;
Hs = [[c, a, l, c, u, l, a, t, o, r]].

In Prolog, it is good practice to retain the generality of your code so that you can readily observe alternative solutions. See logical-purity.

@Boris: By the first rule: "vowel-consonant-vowel (break after the second vowel)": `ato`: vowel-consonant-vowel, so after "o", we break! P.S.: I did not make these rules. — mat, Oct 28 '16 at 20:24
This doesn't sound right. I am quite certain that these rules are for breaking up into syllables: you cannot be left with a non-syllable at the end, right? And there is (and should be!) only one way to break into syllables (or choose possible hyphenation points). Or am I somehow confused? — , Oct 28 '16 at 20:30
@Boris: Unfortunately, I cannot answer your questions, but I definitely regard it as an advantage of Prolog that it can be easily used to show that the stated rules alone lead to ambiguous and even invalid hyphenation, as you correctly point out. Using impure predicates only overshadows this inherent problem of the rules themselves. I posted my solution mainly to show this advantage of Prolog. The same with many "Roman numerals" examples: Most of the rules as stated are ambiguous, but committing prematurely hides this inherent problem. I think it's nice that we can detect this with Prolog! — mat, Oct 28 '16 at 20:42
Don't underestimate the "rule-ness" of hyphenation! The TeXbook contains a great chapter about this topic, and pointers to quite surprising research papers that show how rule-based hyphenation actually is even across different languages. The database in that case is mostly the database of *exceptions* to the rules... It would surprise me though if 2 rules sufficed to capture all valid hyphenation patterns. — mat, Oct 28 '16 at 20:50
Yes, of course you are right about this, after thinking for a minute. It is indeed silly to use a dictionary for this, at least because you don't want your program to break if a new word that follows some basic conventions of the language but missing from your dictionary gets used. — , Oct 28 '16 at 21:25

score 2 · Answer 2 · 2016-11-02T01:53:54.520

I guess its time for a DCG push back solution. The push back is used in the second rule of break//1. It is to reflect that we look at four characters but only consume two characters:

vowel(a). vowel(e). vowel(i). vowel(o). vowel(u).

consonant(C) :- \+ vowel(C).

break([V1,C,V2]) -->
   [V1,C,V2],
   {vowel(V1), consonant(C), vowel(V2)}.
break([V1,C1]), [C2,V2] -->
   [V1,C1,C2,V2],
   {vowel(V1), consonant(C1), consonant(C2), vowel(V2)}.

syllables([L|R]) --> break(L), !, syllables(R).
syllables([[C|L]|R]) --> [C], syllables([L|R]).
syllables([[]]) --> [].

So the overall solution doesn't need some extra predicates such as append/3 or reverse/2. We have also placed a cut to prune the search, which can be done because of the character catchall in the second rule of syllables//1.

Here are some example runs:

Jekejeke Prolog 2, Laufzeitbibliothek 1.1.6
(c) 1985-2016, XLOG Technologies GmbH, Schweiz

?- set_prolog_flag(double_quotes, chars).
Ja

?- phrase(syllables(R), "calculator").
R = [[c,a,l],[c,u,l,a],[t,o,r]] ;
Nein

?- phrase(syllables(R), "kitchensink").
R = [[k,i,t,c,h,e,n],[s,i,n,k]] ;
Nein

P.S.: In some older draft standards this DCG technique was called "right-hand-context", and instead of the verb "push back", the verb "prefixing" was used. In a newer draft standard this is called "semicontext", and instead of the verb "push back", the verb "restoring" is used.

https://www.complang.tuwien.ac.at/ulrich/iso-prolog/dcgs/dcgsdraft-2015-11-10.pdf

coder · Answer 3 · 2016-10-28T20:51:27.007

1

I think you could write it more simply.Here is my implementation:

syllable( Input, Final_Word):-
    atom_chars( Input, Char_list),
    (split(Char_list, Word)-> atom_chars( Final_Word, Word);
        Final_Word=Input).


split([],[]).
split([X,Y,Z|T],[X,Y,Z,'-'|T1]):- 
                    vowel(X),vowel(Z),
                    atom_chars( Input, T),
                    syllable(Input,T2),
                    atom_chars( T2, T1). 

split([X,Y,Z,W|T],[X,Y,'-',Z|T1]):-
                    vowel(X),\+vowel(Y),\+vowel(Z),vowel(W),
                    atom_chars( Input, [W|T]),
                    syllable(Input,T2),
                    atom_chars( T2, T1).    


split([X|T],[X|T1]):- \+vowel(X),split(T,T1).

split/2 splits the word adding '-' where it could be added following the above rules you stated and returns a list to syllable. atom_chars/2 transforms the list to a word. If the word couldn't be split then the output is the input.

Example:

?- syllable(calculator,L).
L = 'calcu-lato-r'.

I'm don't understand why you wrote 'calculator = cal-cula-tor ' since it doesn't follows the rules stated, since "cal" is not vowel-constant-vowel but constant-vowel-constant and same for the rest of thr word...

edited Oct 28 '16 at 20:51

answered Oct 28 '16 at 16:33

coder

12,832
5
39
53

"cal" is not vowel-consonant-vowel, but "alcu" is vowel-consonant-consonant-vowel... – mat Oct 28 '16 at 20:25
Yes exactly so it recognize "alcu" and puts a "-" after that .It is clear at all if he wants "-" before "alcu" if this is what you mean ...so I'm not very sure what the correct output looks like...should it be 'c-alcu-l-ato-r' ?? – coder Oct 28 '16 at 20:32
1

@coder this is was really helpful, but just as mat said, alcu is a vowel-constant-constant-vowel and since the rule suggests that it must be broken between the two constants, it should be al-cu .. – Oct 28 '16 at 20:36
calculator should be cal-cula-tor .. one split between the "al" and "cu" (vowel-constant-constant-vowel .. splitting between the constants) and another split after the "ula"(vowel-constant-vowel .. splitting after the second vowel) – Oct 28 '16 at 20:40
Edited the answer, though I don't understand your criteria because for example you could have cal-cul-ato-r since "ato" is valid... – coder Oct 28 '16 at 20:52

word processing prolog

3 Answers3