21

I'm trying to wrap my head around PEG by entering simple grammars into the PEG.js playground.

Example 1:

  • Input: "abcdef1234567ghijklmn8901opqrs"
  • Desired output: ["abcdef", "1234567", "ghijklmn", "8901", "opqrs"]

  • Actual output: ["abcdef", ["1234567", ["ghijklmn", ["8901", ["opqrs", ""]]]]]

This example pretty much works, but can I get PEG.js to not nest the resulting array to a million levels? I assume the trick is to use concat() instead of join() somewhere, but I can't find the spot.

start
  = Text

Text
  = Numbers Text
  / Characters Text
  / EOF

Numbers
  = numbers: [0-9]+ {return numbers.join("")}

Characters
  = text: [a-z]+ {return text.join("")}

EOF
  = !.

Example 2:

Same problem and code as Example 1, but change the Characters rule to the following, which I expected would produce the same result.

Characters
  = text: (!Numbers .)+ {return text.join("")}

The resulting output is:

[",a,b,c,d,e,f", ["1234567", [",g,h,i,j,k,l,m,n", ["8901", [",o,p,q,r,s", ""]]]]]

Why do I get all these empty matches?

Example 3:

Last question. This doesn't work at all. How can I make it work? And for bonus points, any pointers on efficiency? For example, should I avoid recursion if possible?

I'd also appreciate a link to a good PEG tutorial. I've read (http://www.codeproject.com/KB/recipes/grammar_support_1.aspx), but as you can see I need more help ...

  • Input: 'abcdefghijklmnop"qrstuvwxyz"abcdefg'
  • Desired output: ["abcdefghijklmnop", "qrstuvwxyz", "abcdefg"]
  • Actual output: "abcdefghijklmnop\"qrstuvwxyz\"abcdefg"
start
  = Words

Words
  = Quote
  / Text
  / EOF

Quote
  = quote: ('"' .* '"') Words {return quote.join("")}

Text
  = text: (!Quote . Words) {return text.join("")}

EOF
  = !.
hippietrail
  • 15,848
  • 18
  • 99
  • 158
Nick Evans
  • 211
  • 2
  • 3
  • 1
    FYI: This question refers to a *very* old version of pegjs. Newer versions have an entirely different syntax supporting lots of useful features like `text()` which references the complete text string that as been matched, as in: `digits = [0-9]+ { return text() }` – Rob Raisch Jan 26 '17 at 20:53

3 Answers3

21

I received a reply in the PEG.js Google Group that helped me onto the right track. I'm posting answers to all three problems in the hope that they can serve as a rudimentary tutorial for other PEG beginners like myself. Notice that no recursion is needed.

Example 1:

This is straightforward once you understand basic PEG idioms.

start
  = Text+

Text
  = Numbers
  / Characters

Numbers
  = numbers: [0-9]+ {return numbers.join("")}

Characters
  = text: [a-z]+ {return text.join("")}

Example 2:

The problem here is a peculiar design choice in the PEG.js parser generator for Peek expressions (&expr and !expr). Both peek ahead into the input stream without consuming any characters, so I incorrectly assumed that they didn't return anything. However, they both return an empty string. I hope the author of PEG.js changes this behavior, because (as far as I can tell) this is just unnecessary cruft that pollutes the output stream. Please correct me if I'm wrong about this!

Anyway, here is a workaround:

start
  = Text+

Text
  = Numbers
  / Words

Numbers
  = numbers: [0-9]+ {return numbers.join("")}

Words
  = text: Letter+ {return text.join("")}

Letter
  = !Numbers text: . {return text}

Example 3:

The problem is that an expression like ('"' .* '"') can never succeed. PEG is always greedy, so .* will consume the rest of the input stream and never see the second quote. Here is a solution (that incidentally needs the same Peek workaround as in Example 2).

start
  = Words+

Words
  = QuotedString
  / Text

QuotedString
  = '"' quote: NotQuote* '"' {return quote.join("")}

NotQuote
  = !'"' char: . {return char}

Text
  = text: NotQuote+ {return text.join("")}
chakrit
  • 61,017
  • 25
  • 133
  • 162
Nick Evans
  • 211
  • 1
  • 2
  • Typo in the text of Example 3: should read "... and never see the second quote". It appears you can't edit answers to your own questions in Stackoverflow (unless you're logged in I assume). – Nick Evans Sep 02 '10 at 12:10
  • @NickEvans you just need a bit more rep to be able to edit things. anyway I've fixed that for you. – chakrit Apr 19 '13 at 06:26
  • @NickEvans Please could you accept your answer? That would help people find this answer quicker on search engines. – Toothbrush May 16 '15 at 22:06
1

For current versions of pegjs, you might try:

Example One

Input: "abcdef1234567ghijklmn8901opqrs"

Desired output: ["abcdef", "1234567", "ghijklmn", "8901", "opqrs"]

{
  /**
   * Deeply flatten an array.
   * @param  {Array} arr - array to flatten
   * @return {Array} - flattened array
   */
  const flatten = (arr) =>  Array.isArray(arr) ? arr.reduce((flat, elt) => flat.concat(Array.isArray(elt) ? flatten(elt) : elt), []) : arr
}

start = result:string {
  console.log(JSON.stringify(result))
  return result
}

string = head:chars tail:( digits chars? )* {
  return flatten([head,tail])
}

chars = [a-z]+ {
  return text()
}

digits = $[0-9]+ {
  return text()
}

Example 2

Should be easy to deduce from the answer above.

Example 3

Input: 'abcdefghijklmnop"qrstuvwxyz"abcdefg'

Desired output: ["abcdefghijklmnop", "qrstuvwxyz", "abcdefg"]

{
  /**
   * Deeply flatten an array.
   * @param  {Array} arr - array to flatten
   * @return {Array} - flattened array
   */
  const flatten = (arr) =>  Array.isArray(arr) ? arr.reduce((flat, elt) => flat.concat(Array.isArray(elt) ? flatten(elt) : elt), []) : arr
}

start = result:string {
  console.log(JSON.stringify(result))
  return result
}

string = head:chars tail:quote_chars* {
  return flatten([head,tail])
}

quote_chars = DQUOTE chars:chars {
  return chars
}

chars = [a-z]+ {
  return text()
}

DQUOTE = '"'
Rob Raisch
  • 17,040
  • 4
  • 48
  • 58
-2

Example 1

start
  = alnums

alnums
  = alnums:(alphas / numbers) {
    return alnums;
  }

alphas
  = alphas:$(alpha+)

numbers
  = numbers:$(number+)

number
  = [0-9]

alpha
  = [a-zA-Z]

Example 2

ignore

Example 3

> 'abcdefghijklmnop"qrstuvwxyz"abcdefg'.split('"')
[ 'abcdefghijklmnop',
  'qrstuvwxyz',
  'abcdefg' ]
alsotang
  • 1,520
  • 1
  • 13
  • 15