You stumbled upon a case of catastrophic backtracking !
When you write (\\w+(/|\\\\)?)+
, you are basically introducing the (\\w+)+
pattern into your regex. This leaves the opportunity for the regex engine to match a same string in multiple ways (either using the inner or the outer +
) - the number of possible paths increases exponentially, and since the engine has to try all possible ways of matching before declaring failure it takes forever to return a value.
Also, a few general comments on your regex:
c:\\|
will match, literally, the string c:|
/|\\\\
is just [/\\\\]
(\s+)?
is \s*
.
is a wildcard ("anything but a newline") that need to be escaped
- for the
c
/C
variations, either use [cC]
or make your whole regex case insensitive
- when you don't need to actually capture values, using non-capturing groups
(?:...)
relieves the engine of some work
Taking these into account, a regex in the spirit of your first attempt could be:
\\s*(?:[cC]:[/\\\\])?(?:\\w+[/\\\\])*\\w+\\.[a-z]+
In (?:\\w+[/\\\\])
, the character class [/\\\\]
isn't optional any more, thus avoiding the (\\w+)+
pattern: see demo here.
For more information on catastrophic backtracking, I'd recommend the excellent (and fun !) article by Friedl on the subject on the perl journal.