1

I working on making a Regex pattern where I can extract strings starting with ' " ' and ending with ' " '. But here is the problem - a String may also contains a ' " ' with escape character like this ' \" '. Just like this one "This is a \"Demo\" text". Now I know very little about lookbehind operator. I just need some suggestion if this is possible with a single Regex Pattern ?

Thanks

Barun
  • 1,885
  • 3
  • 27
  • 47
  • 1
    Are you using Java to parse Java ? If not, what regex engine are you using ? Here's a start [`(?<!\\)"(?:[^\\]|\\.)*?"`](http://regex101.com/r/vM2dL8). Also what have you tried ? – HamZa Mar 11 '14 at 07:48
  • You should have a look [here](http://stackoverflow.com/questions/17043454/using-regexes-how-to-efficiently-match-strings-between-double-quotes-with-embed) – fge Mar 11 '14 at 08:54

1 Answers1

4

It should work like this:

"(?:\\.|[^"])+"

without lookahead/behind stuff. This does the following:

  1. Look for a ", consume it
  2. Check if the next 2 characters are a backslash followed by any character (this will match two backslashes \\, where the first is masking the second, and \" as well). If that can not be found, go to Step 3. If found, consume those 2 characters and repeat Step 2.
  3. Check if the next character is not a ". If so, consume and go to step 2. If not (it IS a "), go to Step 4
  4. Consume the " which must be here

As HamZa pointed out, this Regex will fail if a " is found outside of a string and not intended to be a start of a string. E.g. for Java Code this is the case if you have something like

Character c = '\"'

(" as a char) or

if (foo) { /* chosen "sometimes */ String g = "bar"; }

(random " inside a comment)

Mitja
  • 1,969
  • 30
  • 35
  • Aww. Too late. Could someone explain to me if lookbehind (like in HamZa's comment) is necessary here or if my solution works, too? – Mitja Mar 11 '14 at 08:00
  • This would fail for `"This a \\\\" this shouldn't be matched \"`. Ok, it's an edge case, I know :) – HamZa Mar 11 '14 at 08:02
  • I just fixed that issue since it came to my mind just as i posted it ;) – Mitja Mar 11 '14 at 08:04
  • 1
    @TheM You need the lookbehind since it will fail for `do \" not match this but "match this"`. Anyways, +1 – HamZa Mar 11 '14 at 08:06
  • 1
    Ah, okay, now I understand why that's needed, I thought of the text to be java (or some other) code, which wouldn't allow a `\"` to occur somewhere outside a string (except for `'\"'` of course, but that's a real edge case ;) ) Thank you for pointing that out! – Mitja Mar 11 '14 at 08:10
  • @TheM This pattern is working like charm. Would you please explain me briefly how is it working ? – Barun Mar 11 '14 at 08:44