1

I current have this regex

"[^"]*"

I am testing it againts this string (i am using http://regexpal.com/ so it has not been string encoded yet!)

"This is a test \"Text File\"" "This is a test \"Text File\""

Currently it is matching

"This is a test \"
""
"This is a test \"
""

I would like it have the following matches

"This is a test \"Text File\""
"This is a test \"Text File\""

Basicly I want it to match something that starts with " and ends with " but ignore anything in the middle that is \". What do i need to add to my regex to acheive this?

Thanks in advance

zzzzBov
  • 174,988
  • 54
  • 320
  • 367
Jake Rote
  • 2,177
  • 3
  • 16
  • 41
  • 3
    What language are you using? – HamZa Apr 23 '14 at 22:00
  • `(?:[^"]|\\")*` will match the example cases, however you're going to run into other issues once you have an escaped backslash followed by a quote character `"\\" "fail"` – zzzzBov Apr 23 '14 at 22:03
  • It seems to me like this is an [XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), and that you should use a parser. – zzzzBov Apr 23 '14 at 22:05
  • 1
    @JakeRote You're doomed, Dart doesn't support lookbehinds. [See this demo](http://regex101.com/r/mD4kQ4). Also please include the language next time, I was almost done posting an answer... – HamZa Apr 23 '14 at 22:05
  • **For all answers below** please do not rush to just post an answer. Test your regex thoroughly, your regexes fails horribly on [this input](http://regex101.com/r/gP2hT3) – HamZa Apr 23 '14 at 22:11
  • 1
    Duplicate question. See my answer for several versions of increasing efficiency: [Regex to ignore escaped quotes within quotes](http://stackoverflow.com/a/5696141/433790) – ridgerunner Apr 23 '14 at 22:22

3 Answers3

1

Then best way of doing this depends on the matching capabilities are of your regex engine (many of them have varying support for various features). For just a bare-bones regex engine that does not support any kind of look-behind capabilities, this is what you want: "([^"]*\\")*[^"]*"

This will match a quote, followed by zero or more pairs of non-quote sequences and \" sequences, followed by a required non-quote sequence, and finally a final quote.

matt forsythe
  • 3,863
  • 1
  • 19
  • 29
  • 1
    The expression: `"([^"]*\\")*[^"]*"` does not work correctly when the closing quote is preceded by an escaped escape, e.g. _'out1 "in1\\" out2 "in2" out3.'_ The expression needed here is: [`"[^"\\]*(?:\\.[^"\\]*)*"`](http://stackoverflow.com/a/5696141/433790) – ridgerunner Apr 23 '14 at 23:14
  • @ridgerunner I'm not sure I understand your example. I believe the OP wants to ignore escaped quotes, so in your input string, the OP would want it to match `"in1\\" out2 "` and ignore the rest of the string because the remaining quote is unmatched. Am I understanding your example right? – matt forsythe Apr 24 '14 at 02:10
  • @ridgerunner Oh, I see what you are saying: you want the '\' portion of the `\"` string to be a part of ' \\ ', so that `\\"` is really a literal backslash followed by a closing quote. Good catch. – matt forsythe Apr 24 '14 at 02:27
  • I am assuming (possibly a bad move) that the OP wants to be able to (in addition to handling escaped quotes) encode a literal backslash into the string (which needs to be written as a double escape). The example I provided should result in two sub-strings: `in1\\` and `in2`. The first sub-string ends with one of these escaped-escapes, but your pattern does not handle this particular edge case correctly. (But as I said, maybe this is not a requirement (but I would think that it should be.)) – ridgerunner Apr 24 '14 at 02:33
  • @ridgerunner I agree, I misread your original comment. While handling escaped escapes may not be a "requirement", it would certainly be a good practice to handle it, even if you don't expect it in your input. – matt forsythe Apr 24 '14 at 02:56
1

(\\"|[^"])+

will match \" as well as any character that is not "

Christophe
  • 27,383
  • 28
  • 97
  • 140
  • +1 Very nice - I like this better than the accepted answer. (Which happens to be mine!) Shorter and easier to understand, but I would replace the `+` with a `*` so that it will match the empty string. i.e., if the user runs it on `""` (a string containing only two quotes) it should match successfully, but the matched area should be empty. – matt forsythe Apr 24 '14 at 02:24
  • I was reading the other comments, and a more accurate expression would be `(\\\\|\\"|[^"])*` to take into account edge cases (empty string and string ending with an escaped backslash `\\"`) – Christophe Apr 24 '14 at 04:38
  • Yes, and I agree with ridgerunner, but I always have this tension when using regular expressions of how to strike a good balance between the expression being bullet-proof vs. being clear and simple. When striking that balance, you of course have to consider the context of where and how the expression will be used. I like this answer because, of the answers that do not need to be 100% bullet-proof, it is the most clear and concise. – matt forsythe Apr 24 '14 at 14:36
0

Regex for DART:

RegExp exp = new RegExp(r"(".*?"")");

http://regex101.com/r/hM5pI7

EXPLANATION:

Match the regular expression below and capture its match into backreference number 1 «(".*?"")»
   Match the character “"” literally «"»
   Match any single character that is not a line break character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the characters “""” literally «""»
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268