5

I'm trying to catch some text between parathesis with a semicolon in the end.

Example: (in here there can be 'anything' !"#¤);); any character is possible);

I've tried this:

Text
 = "(" text:(.*) ");" { return text.join(""); }

But it seems (.*) will include the last ); before ");" does and I get the error:

Expected ");" or any character but end of input found

The problem is that the text can contain ");" so I want the outer most ); to descide when the line ends.

This regex \((.*)\); does what I want, but how can I do the same in PEG.js? I don't want to include the outer parentheses and semicolon in the result.

This seems like it should be quite easy if you know what you're doing =P

mottosson
  • 3,283
  • 4
  • 35
  • 73
  • I have. Couldn't find what I was looking for. Or maybe I didn't understand it. If you know where in the docs this is specified it would be appreciated if you could tell me where. – mottosson Sep 21 '16 at 06:56

1 Answers1

14

So, the point is that a PEG is deterministic, while a regex is not. So a PEG won't backtrack once it's accepted some input. We can then simulate the semantics you want. Since you say the regex \((.*)\); does what you want, we might translate this to a PEG.

What does this regex do? It consumes all characters up to the end of the input, then keeps backtracking until it sees a );, i.e., it consumes the last possible );.

To make this work with a PEG, we might use a lookahead to keep consuming iff we have a ); ahead.

So, a solution is:

Text
 = "(" text:TextUntilTerminator ");" { return text.join(""); }

TextUntilTerminator
 = x:(&HaveTerminatorAhead .)* { return x.map(y => y[1]) }

HaveTerminatorAhead
 = . (!");" .)* ");"

The TextUntilTerminator non-terminal consumes while HaveTerminatorAhead matches without consuming it (a lookahead, the & symbol). Then it consumes one single character. It does so until it knows we've reached the final ); on the input.

The HaveTerminalAhead non-terminal is simple: it verifies if there is one character ahead, and, if it does, garantees that there is at least one ); after it. We also use the negative-lookahead ! to stop at the first ); we see (avoid consuming it, which would reproduce your original problem).

This PEG, then, reproduces the behavior of the regex you suggested.

paulotorrens
  • 2,286
  • 20
  • 30
  • Nice, it works! And I learned some things too =) Thanks alot! – mottosson Sep 21 '16 at 13:25
  • Note that, if you're using this on a bigger grammar, you probably should add another kind of lookahead to limit the checking. The code above assumes that you want the last `);` on the input, which might not be the desired behavior if you want to match things _beyond_ a `Text`. – paulotorrens Sep 21 '16 at 13:40
  • I will read files with multiple lines of this kind, but maybe I could just add a \n to the grammar to take this into account? – mottosson Sep 21 '16 at 14:20
  • I have trouble understanding how `HaveTerminatorAhead` works. Could you try explaining it in some other way? – raine Oct 12 '17 at 08:33
  • How can we modify this, so that it matches until first delimiter and not the last one? – Tornike Shavishvili Jul 29 '22 at 07:14
  • 1
    @TornikeShavishvili: as `TextUntilTerminator` simply keeps consuming while there is at least one delimiter ahead, you could simply change the `Text` rule to use `HaveTerminatorAhead` directly instead, so it'll consume up to the first delimiter found. Minor changes will be needed, of course (as removing `");"` from the `HaveTerminatorAhead` rule). – paulotorrens Jul 31 '22 at 01:44