6

thanks for looking,

I've had a terrible time trying to get the right search terms for this regex question. I need to ensure that quotes are already escaped in a string, otherwise the match should fail. (Most search results for this kind of question are just pages saying you need to escape quotes or how to escape quotes.)

Valid:

This is valid
This \"is Valid
This is al\"so Valid\"

Invalid:

This i"s invalid
This i"s inv"alid

The only thing I've managed to find so far is

((?:\\"|[^"])*)

This seems to match the first part of the following, but nothing after the escaped quote

This is a \"test

Again, this should fail:

This is a \"test of " the emergency broadcast system

Thanks for any help, I hope this is even possible.

Bung
  • 259
  • 2
  • 7
  • +1, interesting problem for which REs actually seem to be the right tool. – Fred Foo Jan 05 '12 at 17:19
  • @JosephSilber I'm actually not using a language, this regex will go inside a regex field used in a CMS that requires user input to match it :) – Bung Jan 05 '12 at 18:12
  • 1
    There's no such thing as `not using a language`. What language does the CMS use? – Joseph Silber Jan 05 '12 at 18:13
  • 1
    One of the traps to be aware of is that `a\\"b`, `a\\\\"b` are invalid, but that `a\\\"b` and `a\\\\\"b` are valid. That's because the even number of backslashes preceding the quote in the first two examples are tied up as an escape leaving the quote unescaped even though it is preceded by a backslash, while the odd number of backslashes preceding the quote in the second two examples are OK because the even number of backslashes quote the backslashes, leaving the odd backslash to escape the quote. – Jonathan Leffler Jan 05 '12 at 18:20
  • duplicate of [Regex for quoted string with escaping quotes](http://stackoverflow.com/questions/249791/regex-for-quoted-string-with-escaping-quotes) – Bergi Jun 11 '13 at 13:01

5 Answers5

6

In C#, this appears to work as you want:

string pattern = "^([^\"\\\\]*(\\\\.)?)*$";

Stripping out the escaping leaves you with:

^([^"\\]*(\\.)?)*$

which roughly translates into: start-of-string, (multi-chars-excluding-quote-or-backslash, optional-backslash-anychar)-repeated, end-of-string

It's the start-of-string and end-of-string markers which forces the match over the complete text.

adelphus
  • 10,116
  • 5
  • 36
  • 46
  • this has worked well for me, except for one case: This is \" valid \ I believe though that it's pretty unlikely I'll get input with a slash at the end and may be able to let that fail. – Bung Jan 05 '12 at 18:38
  • 2
    Yes, it is designed to fail in the case of a single backslash at the end. The reason being is that if the string is supporting escaped characters using backslash, a single backslash can never be valid. – adelphus Jan 05 '12 at 23:31
2

Don't know the language you use, but I would have done it in this way:

make a regexp, that matches a quote without a backslash, which will fail on

This is a \"test

and succeeded on

This is a \"test of " the emergency broadcast system

for example this one:

.*(?<!\\)".*

and then will use negative expression with the result. hope this will help you

my test in java looks like

    String pat = ".*(?<!\\\\)\".*";
    String s = "This is a \\\"test";
    System.out.println(!s.matches(pat));
    s = "This is a \\\"test of \" the emergency broadcast system";
    System.out.println(!s.matches(pat));
2

You want to use a negative lookbehind.

(?<!\\)"

This regex will match all quotes that are not preceded by a single slash.

If you run this regex against your sample string and it finds 1 or more matches, then the string is not valid.

viggity
  • 15,039
  • 7
  • 88
  • 96
1

You need to take everything except a backslash and a quote, or a backslash and the next character.

([^\\"]|\\.)*

This way, this will fail:

ab\\"c

This will succeed:

ab\\\"c

This will succeed:

ab\"c
Benoit
  • 76,634
  • 23
  • 210
  • 236
1

RegEx you're looking for is:

/^(?:[^"]*(?:(?<=\\\)"|))*$/

Explanation: [^"]* will match input until first " is found or end of input is reached. If " is found then make sure in (?<=\\\)" lookbehind that is always preceded by /. Above scenario is recursively repeated until end of input is reached.

TESTING: Consider following PHP code to test:

$arr=array('This is valid',
'This \"is Valid',
'This is al\"so Valid\"',
'This i"s invalid',
'This i"s inv"alid',
'This is a \"test',
'This is a \"test of " the emergency broadcast system - invalid');
foreach ($arr as $a) {
   echo "$a => ";
   if (preg_match('/^(?:[^"]*(?:(?<=\\\)"|))*$/', $a, $m))
      echo "matched [$m[0]]\n";
   else
      echo "didn't match\n";
}

OUTPUT:

This is valid => matched [This is valid]
This \"is Valid => matched [This \"is Valid]
This is al\"so Valid\" => matched [This is al\"so Valid\"]
This i"s invalid => didn't match
This i"s inv"alid => didn't match
This is a \"test => matched [This is a \"test]
This is a \"test of " the emergency broadcast system - invalid => didn't match
anubhava
  • 761,203
  • 64
  • 569
  • 643