4

I have a snippet from a config file that I need to be able to match the specified string quote contents, but only when they're not commented out, here's my current regex:

(?<!=#)test\.this\.regex\s+\"(.*?)\"

I feel like this should work? I read it like this:

(?<!=#) lookbehind to make sure it's not preceded by a #

test\.this\.regex\s+\"(.*?)\" matches test.this.regex "sup1"

Here is the config snippet

    test.this.regex "sup1" hi |sup1| # test.this.regex "sup3" hi |sup3|
# test.this.regex "sup2" do |sup2|
    test.this.regex "sup2" do |sup2|

But my regex matches all 4 times:

Match 1
1.  sup1
Match 2
1.  sup3
Match 3
1.  sup2
Match 4
1.  sup2
bruchowski
  • 5,043
  • 7
  • 30
  • 46

2 Answers2

0

You can use this PCRE regex:

/(?># *(*SKIP)(*FAIL)|(?:^|\s))test\.this\.regex\s+\"[^"]*\"/

Working Demo

  • (*FAIL) behaves like a failing negative assertion and is a synonym for (?!)
  • (*SKIP) defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later
  • (*SKIP)(*FAIL) together provide a nice alternative of restriction that you cannot have a variable length lookbehinf in above regex.

UPDATE: Not sure whether ruby supports (*SKIP)(*FAIL) so giving this alternative version:

(?:# *test\.this\.regex\s+\"[^"]*\"|\b(test\.this\.regex\s+\"[^"]*\"))

And look for non-empty matched group #1.

Working Demo 2

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • This breaks if there is more than one space after the # though – bruchowski May 17 '14 at 05:28
  • Clarify this in your question and that's precisely the reason why I asked which language you use? – anubhava May 17 '14 at 05:32
  • I thought I did in the first sentence, sorry if you missed that; I also noted I was using Ruby's regex implementation. It should match if there is anything between the `#` and `test.this.regex`, since that means it's a comment and I want to disregard that match – bruchowski May 17 '14 at 05:34
  • See updated answer. Your question needs to be tagged with language/tool in addition to regex actually. – anubhava May 17 '14 at 05:35
  • I had this before, `(#)?.*test\.this\.regex\s+\"(.*)\"` but I was hoping I could match just the strings I wanted with only one capture group – bruchowski May 17 '14 at 05:43
  • 1
    `(#)?.*test\.this\.regex\s+\"(.*)\"` won't help since it will match both strings and you **cannot have** dynamic length in lookbehind. – anubhava May 17 '14 at 05:45
  • `cannot have dynamic length in lookbehind` that's the missing piece, thanks, I didn't know that – bruchowski May 17 '14 at 05:46
  • I believe other than .NET no other language supports dynamic length `lookbehinds`. – anubhava May 17 '14 at 05:48
0

If you question is embodied in the first sentence (and not specifically about lookarounds) why don't you just use String#split with your regex less the lookbehind?

def doit(str)
  r = /test\.this\.regex\s+\"(.*?)\"/
  str.split('#').first[r,1]
end

doit('test.this.regex "sup1" hi |sup1| # test.this.regex "sup3" hi |sup3|')
  #=> "sup1"
doit('# test.this.regex "sup2" do |sup2|')
  #=> nil
doit('test.this.regex "sup2" do |sup2|')
  #=> "sup2"
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100