3

To match a string starting with dog, followed by cat(but not consuming cat), this works:

local lpeg = require 'lpeg'
local str1 = 'dogcat'
local patt1 = lpeg.C(lpeg.P('dog')) * #lpeg.P('cat')
print(lpeg.match(patt1, str1))

Output: dog

To match a string starting with dog, followed with any character sequences, then followed by cat(but not consuming it), like the regex lookahead (dog.+?)(?=cat), I tried this:

local str2 = 'dog and cat'
local patt2 = lpeg.C(lpeg.P("dog") * lpeg.P(1) ^ 1) * #lpeg.P("cat")
print(lpeg.match(patt2, str2))

My expected result is dog and, but it returns nil.

If I throws away the lookahead part (i.e, using the pattern lpeg.C(lpeg.P("dog") * lpeg.P(1) ^ 1)), it can match the whole string successfully. This means * lpeg.P(1) ^ 1 part matches any character sequence correctly, isn't it?

How to fix it?

Yu Hao
  • 119,891
  • 44
  • 235
  • 294

1 Answers1

3

You need to negate "cat" at each position in the lookahead that can match:

local patt2 = lpeg.C(lpeg.P"dog" * (lpeg.P(1)-lpeg.P"cat") ^ 1) * #lpeg.P"cat"

I think it's appropriate to plug the debugger I've been working on (pegdebug), as it helps in cases like this. Here is the output it generates for the original lpeg-expression:

+   Exp 1   "d"
 +  Dog 1   "d"
 =  Dog 1-3 "dog"
 +  Separator   4   " "
 =  Separator   4-11    " and cat"
 +  Cat 12  ""
 -  Cat 12
-   Exp 1

You can see that the Separator expression "eats" all the characters, including "cat" and there is nothing left to match against P"cat".

The output for the modified expression looks like this:

+   Exp 1   "d"
 +  Dog 1   "d"
 =  Dog 1-3 "dog"
 +  Separator   4   " "
 =  Separator   4-8 " and "
 +  Cat 9   "c"
 =  Cat 9-11    "cat"
=   Exp 1-8 "dog and "
/   Dog 1   0   
/   Separator   4   0   
/   Exp 1   1   "dog and "

Here is the full script:

require 'lpeg'
local peg = require 'pegdebug'
local str2 = 'dog and cat'
local patt2 = lpeg.P(peg.trace { "Exp";
  Exp = lpeg.C(lpeg.V"Dog" * lpeg.V"Separator") * #lpeg.V"Cat";
  Cat = lpeg.P("cat");
  Dog = lpeg.P("dog");
  Separator = (lpeg.P(1) - lpeg.P("cat"))^1;
})
print(lpeg.match(patt2, str2))
Paul Kulchenko
  • 25,884
  • 3
  • 38
  • 56
  • Too bad `pegdebug` is not usable with [LPeg.re](http://www.inf.puc-rio.br/~roberto/lpeg/re.html) w/o source modification to `re.lua`. – wqw Jan 03 '16 at 19:47
  • It may be possible with some patching of `match` and other methods in `re`. Patches are welcome ;). – Paul Kulchenko Jan 04 '16 at 01:01
  • I'm using a modified version of Lpeg.re with couple of extensions to the original incl. built-in trace support, %1 args, higher-order -> fun'param' and %patt'param' constructs but there is no official repository to propose a PR with these. The modified version works with your `pegdebug.lua` very well. Here is [a gist you can see a diff](https://gist.github.com/wqweto/9624cca7e02ef03c36c7/revisions?diff=split) to the original. – wqw Jan 04 '16 at 16:28
  • Interesting; I think it should be possible to hack it in with some patching of `mm.P` access (judging by the entry point being `/ mm.P` in `Grammar` pattern). – Paul Kulchenko Jan 05 '16 at 03:40
  • Yes, I did it with a fairly small modification -- back-referencing capture group `T` and calling the trace function *before* wrapping the result in `mm.P`. In `compile` function `T` is passed next to `G` flag they already impl. – wqw Jan 05 '16 at 10:54