I have a simple string:
$string = '--#--%--%2B--';
I want to percent-encode all characters (inclusive the "lonely" %
), except the -
character and the triplets of the form %xy
. So I wrote the following pattern alternatives:
$pattern1 = '/(?:[\-]+|%[A-Fa-f0-9]{2})(*SKIP)(*FAIL)|./us';
$pattern2 = '/(?:[\-]+)(*SKIP)(*FAIL)|(?:%[A-Fa-f0-9]{2})(*SKIP)(*FAIL)|./us';
Please notice the use of (multiple) (*SKIP)(*FAIL)
and of (?:)
.
The result of matching and replacing is the same - and the correct one too:
--%23--%25--%2B--
I would like to ask:
- Are the two patterns equivalent? If not, which one whould be the proper one to use for url-encoding? Could you please explain in few words, why?
- Would you suggest other alternatives (implying backtracking control verbs), or are my patterns a good choice?
- Can I apply only one
(?:)
around the whole (chosen) pattern, even if the (multiple)(*SKIP)(*FAIL)
will be inside it?
I know that I request a little too much from you by asking more questions at once. Please accept my apology! Thank you very much.
P.S: I've tested with the following PHP code:
$result = preg_replace_callback($patternX, function($matches) {
return rawurlencode($matches[0]);
}, $string);
echo $result;