I find that using moo-lexer makes my grammar simpler and I generally spend less time fixing ambiguous grammars as a result.
I'm not an expert in designing grammar but this is what I'd do:
lexer.js
word
will match a sequence of characters
comma
will match " , "
, " ,"
, ", "
and ","
.
space
will match a single space " "
period
will match a single period "."
nl
will match one or more newlines.
const moo = require('moo');
const lexer =
moo.compile
( { word: /[a-zA-Z]+/
, comma:/ ?, ?/
, space: / /
, period: /\./
, nl: {match: /\n+/, lineBreaks: true}
}
);
module.exports = lexer;
grammar.ne
Here we say:
- A text has one or more sentences
- Newlines can occur before and after each sentence
- A sentence may start with a sequence of
%word
followed by either a %comma
or a %space
and must finish with a %word
followed by a %period.
All the post-processing rules are flattening list of tokens and extract .value
from tokens so that we end up with lists of words.
@{% const lexer = require("./lexer.js"); %}
@lexer lexer
text
-> %nl sentence:+ {% ([_, sentences]) => sentences %}
sentence
-> seq:* %word %period %nl {% ([seq, w, p, n]) => [...seq, w.value] %}
seq
-> (%word %space) {% ([[w]]) => w.value %}
| (%word %comma) {% ([[w]]) => w.value %}
This grammar allows to parse this text:
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
Example:
const nearley = require('nearley');
const grammar = require('./grammar.js');
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
parser.feed(`
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
`);
if (parser.results.length > 1) throw new Error('grammar is ambiguous');
JSON.stringify(parser.results[0], null, 2);
Output:
[
[
"After",
"breakfast",
"I",
"went",
"to",
"work"
],
[
"After",
"lunch",
"I",
"went",
"to",
"my",
"desk"
],
[
"After",
"the",
"pub",
"I",
"went",
"home"
],
[
"sleep"
]
]