1

Given the input <outer> Content <inner> Inner <single/> </inner> </outer>.

How would I write grammar that parses the <single> node along with the nodes that have a matching closing node?

Here's my current grammer that was taken from here:

Content =
  (Element / Text)*

Element =
  startTag:StartTag content:Content endTag:EndTag {
    if (startTag != endTag) {
      throw new Error(
        "Expected </" + startTag + "> but </" + endTag + "> found."
      );
    }

    return {
      name:    startTag,
      content: content
    };
  }

StartTag =
  "<" name:TagName ">" { return name; }

EndTag =
  "</" name:TagName ">" { return name; }

TagName = chars:[a-z]+ { return chars.join(""); }
Text    = chars:[^<]+  { return chars.join(""); }

This only works with nodes that have a closing node.

I think the problem lies with the Text rule. So I've been experimenting with altering it to include a negative lookahead like:

Text    = chars:(!EndTag .)* EndTag { return chars.join(""); }

But that hasn't yielded anything successful yet.

Any ideas?

fusepilot
  • 116
  • 9
  • Gah - ya changed the tag delimiters on me, ya goofball! :) – TML Sep 28 '14 at 21:58
  • Oops, sorry about that! It was a typo in the first place. But it still doesn't seem to be working? http://peg.arcanis.fr/fwvT9/1/ – fusepilot Sep 28 '14 at 22:05

1 Answers1

1

The way I did it was to make Element match either an "sTag" or a "selfTag"; if it matches a "selfTag", there's no "Content" or "endTag":

Content =
  (Element / Text)*

Element =
  startTag:sTag content:Content endTag:eTag {
    if (startTag != endTag) {
      throw new Error(
        "Expected </" + startTag + "> but </" + endTag + "> found."
      );
    }

    return {
      name:    startTag,
      content: content
    };
  }
  / startTag:selfTag {
        return startTag;
    }

sTag =
  "<" name:TagName ">" { return name; }

selfTag =
  "<" name:TagName "/>" { return name; }

eTag =
  "</" name:TagName ">" { return name; }

TagName = chars:[a-z-]+ { return chars.join(""); }
Text    = chars:[^<]+  { return chars.join(""); }

Note that this answer requires you use <single/> instead of <single> (that is, the / is required); that's the simplest way to signal to the PEG parser the difference between a dangling start tag and a "self-closing tag".

TML
  • 12,813
  • 3
  • 38
  • 45
  • I get an error: `Expected but found.` It works without the `` node in there. http://peg.arcanis.fr/fwvT9/1/ – fusepilot Sep 28 '14 at 22:02
  • Ah, I see about the `` part now. It works now. Do you think it's impossible to do it with out the closing `/` ? – fusepilot Sep 28 '14 at 22:10
  • It would be a lot more difficult - you'd have to think carefully through how you distinguish between an unclosed start tag (which should rightfully throw an error) and a tag that need not be closed. – TML Sep 28 '14 at 22:12
  • I'm going to play around with getting it to work without the extra `/`. If no dice, I'll come back and change my question to use `` and mark your answer correct. Thanks. – fusepilot Sep 28 '14 at 22:18
  • 1
    [This](http://stackoverflow.com/questions/24659684/problems-with-an-ambiguous-grammar-and-peg-js-no-examples-found?rq=1) SO question and answer might be helpful to you, as what you're going to be dealing with will be an "ambiguous grammar" – TML Sep 28 '14 at 22:31
  • After a full day spent on this, I'm convinced this isn't currently possible with pegjs. There would need to be a construct added to it that would allow you to peek ahead and compare TagNames based on a variable sent from an earlier match. Something similar to this feature request https://github.com/dmajda/pegjs/issues/36#issuecomment-57104942. Since this doesn't seem possible now, I've modified my question. – fusepilot Sep 29 '14 at 04:09