0

I have the following input:

download "http://google.com/index.html" "C:\Users\%username%\AppData\Roaming\test.zip"
keyboard "\"test\""

I need a regex that gives me the content between the quotes like

http://google.com/index.html
C:\Users\%username%\AppData\Raming\test.zip
\"test\"

Its important it ignores the \" character.

Philipp Molitor
  • 387
  • 4
  • 14

1 Answers1

2

AutoIT's regex engines has some limitations, so you'll need to match the string including the quotes around it, and then remove those. This can most easily be done using a capturing group, which means that you need to work with group 1 of the match result instead of the entire match result:

"((?:[^\\"]|\\.)*)"

will match an entire string.

Explanation:

"         # Match "
(         # Match and capture into group 1:
 (?:      # Start of non-capturing group: Either...
  [^\\"]  # match one character that's neither a quote nor a backslash
 |        # or...
  \\.     # match an escaped character.
 )*       # Repeat as needed
)         # End of capturing group
"         # Match "
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Uh, what limitations? – Jerry May 17 '14 at 12:32
  • @Jerry: Lookbehinds need to be fixed-length. That means I can't make sure I start my regex match at an unescaped quote. But you're right, my current regex doesn't make sure of that, either... – Tim Pietzcker May 17 '14 at 12:34
  • Yes, +1, but `/"[^\\"]*(?:\\.[^\\"]*)*"/` is much more efficient. See: [PHP: Regex to ignore escaped quotes within quotes](http://stackoverflow.com/a/5696141/433790) Haven't you read [MRE3](http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124 "Mastering Regular Expressions (3rd Edition)") yet? (I'f not you are really missing out - especially considering your ginormous regex-fu! - Just sayin') – ridgerunner May 17 '14 at 14:34
  • 1
    Uh oh, I just spotted a problem: `"((?:[^\\"]+|\\.)*)"` experinces catastrophic backtracking when applied to non-matching strings such as `"12345678901234567890`. (The `+` on the `[^\\"]+` needs to either be made atomic (prolly not supported by AutoIT), or removed entirely.) – ridgerunner May 17 '14 at 14:44
  • I see that the contents of the string are to be captured. In this case my recommended regex would be: `/"([^\\"]*(?:\\.[^\\"]*)*)"/` – ridgerunner May 17 '14 at 14:49
  • @ridgerunner: Sorry for not replying sooner, was AFK for a while. You're right that the "unrolled loop" technique is more efficient (of course I've read Friedl), but I find this simpler version easier to understand. But certainly the `+` needs to go. – Tim Pietzcker May 17 '14 at 16:13