35

I need a Perl regular expression to match a string. I'm assuming only double-quoted strings, that a \" is a literal quote character and NOT the end of the string, and that a \ is a literal backslash character and should not escape a quote character. If it's not clear, some examples:

"\""    # string is 1 character long, contains dobule quote
"\\"    # string is 1 character long, contains backslash
"\\\""  # string is 2 characters long, contains backslash and double quote
"\\\\"  # string is 2 characters long, contains two backslashes

I need a regular expression that can recognize all 4 of these possibilities, and all other simple variations on those possibilities, as valid strings. What I have now is:

/".*[^\\]"/

But that's not right - it won't match any of those except the first one. Can anyone give me a push in the right direction on how to handle this?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Chris Lutz
  • 657
  • 2
  • 6
  • 8

7 Answers7

44

/"(?:[^\\"]|\\.)*"/

This is almost the same as Cal's answer, but has the advantage of matching strings containing escape codes such as \n.

The ?: characters are there to prevent the contained expression being saved as a backreference, but they can be removed.

NOTE: as pointed out by Louis Semprini, this is limited to 32kb texts due a recursion limit built into Perl's regex engine (that unfortunately silently returns a failure when hit, instead of crashing loudly).

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • 2
    This answer is more correct. I tested a lot more strings and it works better than @Cal's for things like `"\"\'\""`. – Xeoncross Mar 30 '12 at 22:31
  • 1
    WARNING there is an insane silent failure with this regex if the string is >32k/64k due to a Perl bug from 2002 that has not yet been fixed in 2020!!! https://stackoverflow.com/a/26229500/1046167 – Louis Semprini May 08 '20 at 21:27
  • @LouisSemprini: Surprising, thanks! The reason (stack growth due to backtracking) is arguably sensible, at least for a backtracking RE implementation ([most REs can be actually searched for without backtracking using DFA-based algorithm instead](https://swtch.com/~rsc/regexp/regexp1.html)), but I think this should definitely `die()` with a visible error. – j_random_hacker May 09 '20 at 15:10
26

How about this?

/"([^\\"]|\\\\|\\")*"/

matches zero or more characters that aren't slashes or quotes OR two slashes OR a slash then a quote

Cal
  • 7,067
  • 25
  • 28
  • 2
    Paul: strings can be matched by regexes, however parenthesised expressions (and anything else that can nest arbitrarily deep) cannot. – j_random_hacker Jan 26 '09 at 22:10
  • This regex has false positives on strings such as """ – Leon Timmermans Jan 26 '09 at 22:46
  • Cal: I think you need to double all of those backslashes. (Maybe you already did, and SO stripped them out?) – j_random_hacker Jan 26 '09 at 22:48
  • It looks fine to me. In some languages double slashed are necessary, but not in Perl. – Leon Timmermans Jan 26 '09 at 23:50
  • fyi, i did double the backslashes and SO stripped them – Cal Jan 27 '09 at 04:28
  • 1
    You need to "code-ify" the regex: either enclose it in `backticks`, or indent it four spaces and leave empty lines above and below it. – Alan Moore Jan 27 '09 at 06:40
  • @Cal: yes, that's happened to me too. The `backticks` cures that, as Alan suggested. – j_random_hacker Jan 28 '09 at 04:26
  • @Leon: By coincidence, Cal's original regex **as displayed** (i.e. with no doubled backslashes) was *also* valid Perl syntax, although it didn't do what he wanted -- e.g. it let through """ as you pointed out. The double-backslashed version now on display doesn't have that problem. – j_random_hacker Jan 28 '09 at 04:29
  • WARNING there is an insane silent failure with this regex if the string is >32k/64k due to a Perl bug from 2002 that has not yet been fixed in 2020!!! https://stackoverflow.com/a/26229500/1046167 – Louis Semprini May 08 '20 at 21:27
9

A generic solution(matching all backslashed characters):

/ \A "               # Start of string and opening quote
  (?:                #  Start group
    [^\\"]           #   Anything but a backslash or a quote
    |                #  or
    \\.              #   Backslash and anything
  )*                 # End of group
  " \z               # Closing quote and end of string
  /xms
Leon Timmermans
  • 30,029
  • 2
  • 61
  • 110
  • 3
    Though you may want to omit the `\A` and/or `\z` -- they imply that there can be nothing preceding or trailing the double-quoted string. – j_random_hacker Jan 17 '10 at 10:42
5

See Text::Balanced. It's better than reinvent wheel. Use gen_delimited_pat to see result pattern and learn form it.

Hynek -Pichi- Vychodil
  • 26,174
  • 5
  • 52
  • 73
4

Here's a very simple way:

/"(?:\\?.)*?"/

Just remember if you're embedding such a regex in a string to double the backslashes.

Boann
  • 48,794
  • 16
  • 117
  • 146
2

RegExp::Common is another useful tool to be aware of. It contains regexps for many common cases, included quoted strings:

use Regexp::Common;

my $str = '" this is a \" quoted string"';
if ($str =~ $RE{quoted}) {
  # do something
}
Rob Van Dam
  • 7,812
  • 3
  • 31
  • 34
0

Try this piece of code : (\".+")

TheLostMind
  • 35,966
  • 12
  • 68
  • 104