I've memorized and use the following pattern whenever I have to use escape characters, such as with a file path or a string with escapes:
http://www.softec.lu/site/RegularExpressions/UnrollingTheLoop
And the pattern is:
normal* ( special normal* )*
Or, if there is a known start/end (such as a string having quotes on either side):
start normal* ( special normal* )* end
For a very basic example, let's say I want to capture the string '<string>'
, which can contain escapes. Using the pattern I have:
start = '
normal = [^'\\] (anything except a quote or escape)
special = \\. (an escape and then any character)
end = '
And doing the substitutions and doing some trivial change for capturing groups I have:
My question is why that cannot just be shortened to:
start (normal | special )* end
For example:
It seems much less repetitious to implement. What advantages if any does the first technique have over this simplified way?