Resolving a parameter pattern without a specific character at front and back

Question

For practice i'm creating my own PHP router. This router could take in parameters which are specified like this:

{i:variableName}

The i stands for the parameter type (in this case integer) and the variableName stands for the variablename.

A single routing URI could look like this:

/home/{i:id}-{s:noVar}/{m:varName}/{s:someOther}

I've created the following regex pattern for this purpose:

[^{}]*({((?<type>\D):)?(?<name>[a-zA-Z_-ÿ][a-zA-Z0-9_-ÿ]+)})[^{}]*

For not having 2 parameters next to each other, and having a character next to it, I expanded the regex with this piece:

[^{}]*

An example for this is that I won't be able to do stuff like:

/home/{i:id}{s:noVar}/{m:varName}{s:someOther}

There need to be characters between them.

I thought this piece of regex would do, "Do not include zero or more of the { or } character.

When I run this regex on a pattern like /home/{i:id}{s:noVar}/{m:varName}/{s:someOther}, it still retrieves all the parameters, even the ones that are next to each other.

How is this possible and how can I make it so that the regex will only retrieve parameters that aren't next to each other?

Enclose your first pattern with `(?<!})` and `(?!{)`. See https://regex101.com/r/fF5wQ3/1 — Wiktor Stribiżew, Jul 22 '16 at 10:39
@WiktorStribiżew What do you mean? I'm still a beginner on regular expressions. — Bas, Jul 22 '16 at 10:39
Are you aware that `_-ÿ` creates a range? Is that your intent? — Wiktor Stribiżew, Jul 22 '16 at 10:45
@WiktorStribiżew It's for a variable name, I basicly googled it. I guess it's off topic now, but what does it exactly do? — Bas, Jul 22 '16 at 10:46
I do not understand one point: is the `i:` really optional? Check if [`(?:{(?:[a-zA-Z]:)?[a-zA-Z_]\w*}){2,}(*SKIP)(*F)|({(?:(?[a-zA-Z]):)?(?[a-zA-Z_]\w*)})`](https://regex101.com/r/fF5wQ3/2) works for you. — Wiktor Stribiżew, Jul 22 '16 at 10:50
Does my suggestion work for you? Do you have some more test cases? — Wiktor Stribiżew, Jul 22 '16 at 11:09
@WiktorStribiżew I'm still working with it, I don't really get it to be honest, what does it do exactly? — Bas, Jul 22 '16 at 11:10
Ignores consecutive `{...}` substrings and only handles another one that is not. — Wiktor Stribiżew, Jul 22 '16 at 11:15
@WiktorStribiżew That makes sense! Thank you, if it was an answer i'd accept it. — Bas, Jul 22 '16 at 11:18
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/118096/discussion-between-bas-and-wiktor-stribizew). — Bas, Jul 23 '16 at 15:33

score 1 · Accepted Answer · edited May 23 '17 at 11:51

1

I suggest matching 2 or more consecutive {...} blocks and ignore those matches, and only handle all other {...} (non-adjacent) blocks. Use the well-known PCRE (*SKIP)(*F) technique:

(?:{(?:[a-zA-Z]:)?[a-zA-Z_]\w*}){2,}(*SKIP)(*F)|{(?:(?<type>[a-zA-Z]):)?(?<name>[a-zA-Z_]\w*)}

See the regex demo

Explanation:

(?:{(?:[a-zA-Z]:)?[a-zA-Z_]\w*}){2,}(*SKIP)(*F) - The first alternative branch (of the 2) in the regex that matches the pattern explained below (just without capture groups) that appear in succession, see {2,} that means match 2 or more occurrences at a stretch. The (*SKIP)(*FAIL) verbs make the regex engine omit this match and proceed.
| - or match what we need:
{ - an open {
(?:(?<type>[a-zA-Z]):)? - an optional group matching an ASCII letter (captured into Group "type") and a :
(?<name>[a-zA-Z_]\w*) - Group "name" capturing an ASCII letter or _ (see [a-zA-Z_] followed with 0+ word chars (from the [a-zA-Z0-9_] range)
} - closing }

edited May 23 '17 at 11:51

Community

1
1

answered Jul 22 '16 at 11:20

Wiktor Stribiżew

607,720
39
448
563

This just got alot more complicated, thanks for the explaination. – Bas Jul 22 '16 at 13:04
Once you understand what the pattern is matching, it won't be hard. – Wiktor Stribiżew Jul 22 '16 at 13:06
At first, there was `(?<!})` and `(?!{)`, where are they now? Is the thing you wrote at the beginning of the full regex the same? – Bas Jul 22 '16 at 13:07
Read my explanation in the answer. We match consecutive `{}` blocks, then discard them. Only matched blocks are standalone ones. If we use the lookbehind `(?<!})` with the lookahead `(?!{)` we will also exclude cases where the `{}` block is just preceded with `}` **OR** followed with `{`. And there is no guarentee there is a whole block there. We can check for a block *after* a block, but we cannot check for a block *before* a block. Certainly, you can use the lookaround approach *if* you are sure there will only be `{}` around *blocks*. – Wiktor Stribiżew Jul 22 '16 at 13:15

Resolving a parameter pattern without a specific character at front and back

1 Answers1