1

I use this regex to capture text between quotes:

\"\\(.*?\\)\"

How to make it supports escaping with \"?

Ron
  • 7,588
  • 11
  • 38
  • 42

3 Answers3

3

If you're working in Emacs, it is useful to use Emacs-specific tools to work with regular expressions. Specifically, I recommend using re-builder to build your expression in an interactive way.

But even those expressions become fairly write-only; they are very hard to look at in a few days or weeks and know what they mean without excellent documentation (like sln has provided for the Perl expression).

I tend to build all my regexps in Emacs using the rx macro, which provides more readable code, I think (though that might be up for debate). In your case, the expression would looks something like:

(rx "\"" (group (0+ (or (1+ (not (any "\"" "\\"))) (seq "\\" anything)))) "\"")

If you evaluate this expression in Emacs, it produces:

"\"\\(\\(?:[^\"\\]+\\|\\\\\\(?:.\\|\\)\\)*\\)\""

Which, by itself, would be pretty difficult to understand. This is actually the same expression sln offered above, but it is escaped to work with Emacs.

R. P. Dillon
  • 2,780
  • 1
  • 18
  • 20
1

Here's a regex that will do the job:

"\(\\[\\"]\|[^\\"]\)*"

It matches quotation marks which surround any number of a) a quotation mark or backslash preceded by a backslash, or b) any character other than a quotation mark or backslash.

Your question didn't specify that backslashes should also be escapable, but that seems like a reasonable assumption to me. If that's not correct, just replace the [\\"] with a single quotation mark.

Note that if you're going to put this expression into a string, for example to use it in Lisp code, you'll need to insert a backslash before each quotation mark, and then double up all of the other backslashes:

(setq my-quoted-string-regexp "\"\\(\\\\[\\\\\"]\\|[^\\\\\"]\\)*\"")
Sean
  • 29,130
  • 4
  • 80
  • 105
  • Anything short of allowing 'escape + anything' is probably not a good idea. –  Feb 07 '11 at 19:36
0

In Perl (adjust for emacs). This is the absolute fastest regex to do this ...
' " ( (?: [^"\\]+ | \\. )* ) " ' expanded

or ' " ( ( [^"\\]+ | \\. )* ) " ' expanded

or '"(([^"\\]+|\\.)*)"'

If emacs escapes capture groups and is dbl quoted and has no non-capture grps, has to be adjusted.

The equivalent break down:

"                   # literal "
(                   # capture group 1
   (                   # capture group 2 (non-capture context)
       [^"\\]+            # Greedy captue of NOT " nor \ , 1 or more times
     |                       # OR ..
       \\.                # Any character preceded by a \  , 1 time
   )*                  # end group 2 , done 0 or more times
)                  # end group 1
"                  # literal "