I'm writing a regular expression using javascript that is intended to capture string literals in javascript code in all the permutations that are allowed in javascript. This is what I've come up with:
([\"\'])(.*?(?:(\\"|\\').*?\3.*?)*?)\1
Description: The regular expression captures the starting quotation mark (" or ') in capture group 1 and the quotation mark is repeated at the end (\1) of the expression to enclose the full string literal. Since the "body" of the string literal can contain substrings enclosed in escaped quotation marks (example: "ab\"cd\"ef") I allow for matched pairs of escaped single and double quotations to occur within the string literal text. Capture group 3 is used to match starting and ending escaped quotation marks. The content of the string literal will be in capture group 2 with the outer quotation marks removed (the mark used to enclose the string will be in capture group 1). Note that I use (?:..) to make one of the groups non-capturing.
I've tested the expression on the strings below and it seems to be working:
"abcdefg" // Simple string literal using ".."
'abcdefg' // Simple string literal using '..'
"a\"b\"c\"d\"e\'f\'g" // Escaped matched singles and doubles
"a\"b\"\"c\"\'d\'\'e\'fg" // Another variant
"\"ab\"\'cd\'ef\"\"\'\'g" // Zero length escaped sequences
"a'b'cd'ef'g" // Enclosed in doubles, singles in middle
'"ab"cd"e""f"g' // Enclose in singles, doubles in middle
My question is if there are any other permutations that are allowed in javascript that I need to consider. Note that single quotation sequences enclosed within a double quotation string literal ("ab'cde'fg") and double quotation sequences enclosed within a single quotation string literal ('ab"cde"fg') do not need to be handled separately (I think), since the pattern matches the enclosing outer quotation marks. I would also appreciate feedback regarding any potential cross-browser issues - if there are browsers that don't support regular expressions at all or don't support features I use here (such as capturing groups or non-capturing syntax).
Edit: I am attempting to capture escaped string literals embedded in a string literal. That makes this problem statement different than that expressed in regex-for-quoted-string-with-escaping-quotes