21

I want to ignore whitespaces and new lines with my grammar so they are missing in the PEG.js output. Also, a literal within brackets should be returned in a new array.

Grammar

start
  = 'a'? sep+ ('cat'/'dog') sep* '(' sep* stmt_list sep* ')'

stmt_list
  = exp: [a-zA-Z]+ { return new Array(exp.join('')) }

sep
  = [' '\t\r\n]

Test case

a dog( Harry )

Output

[
   "a",
   [
      " "
   ],
   "dog",
   [],
   "(",
   [
      " "
   ],
   [
       "Harry"
   ],
   [
      " "
   ],
   ")"
]

Output I want

[
   "a",
   "dog",
   [
      "Harry"
   ]
]
Matthias
  • 7,432
  • 6
  • 55
  • 88

1 Answers1

29

You have to break up the grammar more, using more "non-terminals" (not sure if that's what you call them in a PEG):

start
  = article? animal stmt_list

article
  = article:'a' __ { return article; }

animal
  = animal:('cat'/'dog') _ { return animal; }

stmt_list
  = '(' _ exp:[a-zA-Z]+ _ ')' { return [ exp.join('') ]; }

// optional whitespace
_  = [ \t\r\n]*

// mandatory whitespace
__ = [ \t\r\n]+

Thanks for asking this question!

Edit: To increase readability, have two productions: _ and __

Pointy
  • 405,095
  • 59
  • 585
  • 614
  • Thanks! There is just one thing: Try `dog( Harry )`. The article should be optional. Bringing the `?` from `'a'` (in article) to `article` (in start) will still return an empty string... Is this PEG.js related? – Matthias Nov 24 '11 at 14:14
  • I think maybe making the "start" rule have "article?" would help. Then, the "article" rule itself could be just `'a' sep*` – Pointy Nov 24 '11 at 14:17
  • Then it would also allow `adog( Harry )`. Anyway, maybe I am using the wrong tool... I am havin a xdot [grammar](http://www.graphviz.org/content/dot-language) (xdot is based on dot) that I want to parse and draw to a canvas. Do you know of any other time-saving approach evaluating the grammar of the file (except of writing my own parser or using thigs like [canviz](http://code.google.com/p/canviz/) (not enough functionality))? – Matthias Nov 24 '11 at 14:26
  • With the original (as in my answer), " dog( Harry )" is parsed correctly. The "article" is returned in the result as an empty string, but it does parse. – Pointy Nov 24 '11 at 14:43
  • 2
    As to how to parse a `dot` file, the grammar is complicated enough that I think you need a "real" parser. It doesn't have to be PEG of course; you could write your own recursive descent parser in JavaScript or perhaps use something like [Jison](http://zaach.github.com/jison/docs/). I don't have experience with PEG parsing but I think it's fascinating. :-) – Pointy Nov 24 '11 at 14:46
  • 3
    not `[' '\t\r\n]` but `[ \t\r\n]`. `[' '\t\r\n]` will catch `'`. – alsotang Jan 09 '14 at 09:13
  • There are some other ideas in this [published article on PEG](https://pdos.csail.mit.edu/~baford/packrat/popl04/peg-popl04.pdf#page=2) (See Figure 1) – Fuhrmanator Jan 25 '17 at 15:06