Parsing list format in MarkDown file use Antlr4

Question

I'm try to parse MarkDown text with Antlr4. to make it easy I get to parse list view first. And I found a webpage about it. http://www.cforcoding.com/2010/01/markdown-and-introduction-to-parsing.html

The grammer in that webpage seems ok to me, I change it to fit Antlr4 format like this:

grammar MarkDown;

listItem    : ORDERED inline NEWLINE
        | UNORDERED inline NEWLINE
        ;
inline      : (~ NEWLINE)+ ;
ORDERED     : DIGIT+ '.' (' ' | '\t')+ ;
UNORDERED   : ('*' | '-' | '+') (' ' | '\t')+ ;
DIGIT       : [0-9]+ ;

NEWLINE     : '\r'? '\n' ;

example file

1. abc
2. kljjkj
3. tree4545

But it not works, error messages below

line 1:3 token recognition error at: 'a'
line 1:4 token recognition error at: 'b'
line 1:5 token recognition error at: 'c'
line 1:6 extraneous input '\r\n' expecting {ORDERED, UNORDERED, DIGIT}
line 2:3 token recognition error at: 'k'
line 2:4 token recognition error at: 'l'
line 2:5 token recognition error at: 'j'
line 2:6 token recognition error at: 'j'
line 2:7 token recognition error at: 'k'
line 2:8 token recognition error at: 'j'
(listItem 1.  (inline \r\n 2. ) \r\n)

Could you help me fix this?

score 0 · Answer 1 · edited May 23 '17 at 11:50

0

Inside a parser rule, the ~ negates tokens, not characters. So inline will try to match any tokens except NEWLINE, which is either ORDERED, UNORDERED or DIGIT.

ANTLR complains about the input "abc", "kljjkj", ... because no lexer rule match these chars.

Although the following Q&A is about ANTLR3, the same rules apply to ANTLR4: Negating inside lexer- and parser rules

edited May 23 '17 at 11:50

Community

1
1

answered Dec 19 '13 at 09:49

Bart Kiers

166,582
36
299
288

Thanks for reply. Could you give an directly suggestion of fixing it? See if I can understand easily. – jmuok Dec 20 '13 at 02:58

Parsing list format in MarkDown file use Antlr4

1 Answers1