2

For practice i'm creating my own PHP router. This router could take in parameters which are specified like this:

{i:variableName}

The i stands for the parameter type (in this case integer) and the variableName stands for the variablename.

A single routing URI could look like this:

/home/{i:id}-{s:noVar}/{m:varName}/{s:someOther}

I've created the following regex pattern for this purpose:

[^{}]*({((?<type>\D):)?(?<name>[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]+)})[^{}]*

For not having 2 parameters next to each other, and having a character next to it, I expanded the regex with this piece:

[^{}]*

An example for this is that I won't be able to do stuff like:

/home/{i:id}{s:noVar}/{m:varName}{s:someOther}

There need to be characters between them.

I thought this piece of regex would do, "Do not include zero or more of the { or } character.

When I run this regex on a pattern like /home/{i:id}{s:noVar}/{m:varName}/{s:someOther}, it still retrieves all the parameters, even the ones that are next to each other.

How is this possible and how can I make it so that the regex will only retrieve parameters that aren't next to each other?

Bas
  • 2,106
  • 5
  • 21
  • 59

1 Answers1

1

I suggest matching 2 or more consecutive {...} blocks and ignore those matches, and only handle all other {...} (non-adjacent) blocks. Use the well-known PCRE (*SKIP)(*F) technique:

(?:{(?:[a-zA-Z]:)?[a-zA-Z_]\w*}){2,}(*SKIP)(*F)|{(?:(?<type>[a-zA-Z]):)?(?<name>[a-zA-Z_]\w*)}

See the regex demo

Explanation:

  • (?:{(?:[a-zA-Z]:)?[a-zA-Z_]\w*}){2,}(*SKIP)(*F) - The first alternative branch (of the 2) in the regex that matches the pattern explained below (just without capture groups) that appear in succession, see {2,} that means match 2 or more occurrences at a stretch. The (*SKIP)(*FAIL) verbs make the regex engine omit this match and proceed.
  • | - or match what we need:
  • { - an open {
  • (?:(?<type>[a-zA-Z]):)? - an optional group matching an ASCII letter (captured into Group "type") and a :
  • (?<name>[a-zA-Z_]\w*) - Group "name" capturing an ASCII letter or _ (see [a-zA-Z_] followed with 0+ word chars (from the [a-zA-Z0-9_] range)
  • } - closing }
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This just got alot more complicated, thanks for the explaination. – Bas Jul 22 '16 at 13:04
  • Once you understand what the pattern is matching, it won't be hard. – Wiktor Stribiżew Jul 22 '16 at 13:06
  • At first, there was `(?<!})` and `(?!{)`, where are they now? Is the thing you wrote at the beginning of the full regex the same? – Bas Jul 22 '16 at 13:07
  • Read my explanation in the answer. We match consecutive `{}` blocks, then discard them. Only matched blocks are standalone ones. If we use the lookbehind `(?<!})` with the lookahead `(?!{)` we will also exclude cases where the `{}` block is just preceded with `}` **OR** followed with `{`. And there is no guarentee there is a whole block there. We can check for a block *after* a block, but we cannot check for a block *before* a block. Certainly, you can use the lookaround approach *if* you are sure there will only be `{}` around *blocks*. – Wiktor Stribiżew Jul 22 '16 at 13:15