I use this regex to capture text between quotes:
\"\\(.*?\\)\"
How to make it supports escaping with \"?
If you're working in Emacs, it is useful to use Emacs-specific tools to work with regular expressions. Specifically, I recommend using re-builder
to build your expression in an interactive way.
But even those expressions become fairly write-only; they are very hard to look at in a few days or weeks and know what they mean without excellent documentation (like sln has provided for the Perl expression).
I tend to build all my regexps in Emacs using the rx
macro, which provides more readable code, I think (though that might be up for debate). In your case, the expression would looks something like:
(rx "\"" (group (0+ (or (1+ (not (any "\"" "\\"))) (seq "\\" anything)))) "\"")
If you evaluate this expression in Emacs, it produces:
"\"\\(\\(?:[^\"\\]+\\|\\\\\\(?:.\\|\\)\\)*\\)\""
Which, by itself, would be pretty difficult to understand. This is actually the same expression sln offered above, but it is escaped to work with Emacs.
Here's a regex that will do the job:
"\(\\[\\"]\|[^\\"]\)*"
It matches quotation marks which surround any number of a) a quotation mark or backslash preceded by a backslash, or b) any character other than a quotation mark or backslash.
Your question didn't specify that backslashes should also be escapable, but that seems like a reasonable assumption to me. If that's not correct, just replace the [\\"]
with a single quotation mark.
Note that if you're going to put this expression into a string, for example to use it in Lisp code, you'll need to insert a backslash before each quotation mark, and then double up all of the other backslashes:
(setq my-quoted-string-regexp "\"\\(\\\\[\\\\\"]\\|[^\\\\\"]\\)*\"")
In Perl (adjust for emacs). This is the absolute fastest regex to do this ...
' " ( (?: [^"\\]+ | \\. )* ) " '
expanded
or ' " ( ( [^"\\]+ | \\. )* ) " '
expanded
or '"(([^"\\]+|\\.)*)"'
If emacs escapes capture groups and is dbl quoted and has no non-capture grps, has to be adjusted.
The equivalent break down:
" # literal "
( # capture group 1
( # capture group 2 (non-capture context)
[^"\\]+ # Greedy captue of NOT " nor \ , 1 or more times
| # OR ..
\\. # Any character preceded by a \ , 1 time
)* # end group 2 , done 0 or more times
) # end group 1
" # literal "