1

I search a regex in PHP to match a simple URL path with specific characters and not more.

My regex don't work exactly (flag 'gm' only for test. in working process please without 'g' for more exactly.):

/^\/[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?[A-Za-z0-9-]+\/?$/gm

URL path Examples with comment:

#match: YES
/
/trip-001
/trip-001/
/trip-001/summer-2019
/trip-001/summer-2019/
/trip-001/summer-2019/ibiza-001/
/trip-001/summer-2019/ibiza-001/PICT-001

#match: NO
//
trip-001
trip-001/
trip-001/summer-2019
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001

//trip-001
trip-001//
//trip-001/summer-2019
//trip-001//summer-2019
trip-001//summer-2019
//trip-001/summer-2019/
//trip-001//summer-2019//
trip-001//summer-2019/
trip-001/summer-2019//
trip-001/summer-2019/
trip-001/summer-2019/ibiza-001/
//trip-001/summer-2019/ibiza-001/
//trip-001//summer-2019/ibiza-001/
//trip-001/summer-2019//ibiza-001/
//trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001//
trip-001/summer-2019/ibiza-001/
trip-001/summer-2019/ibiza-001/PICT-001
//trip-001/summer-2019/ibiza-001/PICT-001
# and similar

/trip-001/summer-2019/ibiza-001/PICT-001/
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001
/trip-001/summer-2019/ibiza-001/whatever-987/PICT001/

trip-001/summer-2019/ibiza-001/PICT-001/
trip-001/summer-2019/ibiza-001/whatever-987/PICT001
trip-001/summer-2019/ibiza-001/whatever-987/PICT001/

I have no idea it works with {n}.

Only this charset: A-Z a-z 0-9 - / and exactly no more. Please no \d for digits.

It's for a !preg_match() in PHP.

EDIT: Leading slash is a must have. Double slash and more is not allowed. Trailing slash yes or no.

Malama
  • 50
  • 2
  • 11

1 Answers1

1

It appears the URL should only be valid if there are not more than 5 slashes.

You may adjust your pattern as

^(?!(?:[^\/]*\/){5})(?:(?:\/[A-Za-z0-9-]+){1,4}\/?|\/)$

See regex demo

Details

  • ^ - start of string
  • (?!(?:[^\/]*\/){5}) - a negative lookahead that fails the match if there are 5 occurrences of / chars in the string
  • (?: - start of the non-capturing group:
    • (?:\/[A-Za-z0-9-]+){1,4}\/? - 1 to 4 occurrences of a / and 1+ ASCII alphanumeric or - chars and then an optional / char
    • | - or
    • \/ - a single / char in the string
  • ) - end of the non-capturing group
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I'm not sure yet. I have to expand my rules. Leading slash is a must have. Double slash is not allowed. Trailing slash yes or no. – Malama Apr 19 '20 at 17:20
  • 1
    @Malama Correct. That is what I suggest, too. See [updated samples demo](https://regex101.com/r/NgBfeW/3). – Wiktor Stribiżew Apr 19 '20 at 17:25
  • I don't understand it totaly. The question marks on the begin are for empty strings? And a `*` confusing me a little bit. Can your replace the `*` with the charset `[A-Za-z0-9-]+`? The use of `:` and more than one i don't understand realy. – Malama Apr 19 '20 at 17:48
  • 1
    @Malama `(?:[^\/]*\/){5}` matches 5 occurrences of any 0+ chars other than `/` followed with `/`. It is a pattern inside a negative lookahead, it just fails any string that has 5 or more `/` chars in it. You do not need to replace it with any pattern, the *consuming* pattern will validate the string. `(?:...)` is a [non-capturing group](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-in-regular-expressions). – Wiktor Stribiżew Apr 19 '20 at 17:53
  • It's very complex. I must tested it longer. I have forgotten one. Match a single leading slash `/` is absolutely necessary. Sorry for that. – Malama Apr 19 '20 at 18:07
  • 1
    @Malama Both my regexps require `/` as the first char. See [your updated sample demo](https://regex101.com/r/NgBfeW/4). If you need more clarifications please ask here, add more comments. – Wiktor Stribiżew Apr 19 '20 at 18:14
  • OK, many thanks. The single `/` match is missing. Realy sorry for that, i have it forgotten. Your example with the `/` https://regex101.com/r/NgBfeW/5 – Malama Apr 19 '20 at 18:26
  • 1
    @Malama Is empty string allowed? – Wiktor Stribiżew Apr 19 '20 at 18:28
  • Thats a good question. I've been wondering that too. It depends on what PHP `$_SERVER ['REQUEST_URI']` delivers when calling example.com. Some browsers hide the slash (original is it example.com/). On my tests a single `/` is delivered. But i have not enought know for that. I have coded my script so that I assume a single `/`. Empty string are NOT allowed. – Malama Apr 19 '20 at 18:35
  • 1
    @Malama I fixed the pattern in the answer. – Wiktor Stribiżew Apr 19 '20 at 18:40
  • 1
    Many thanks! A first test was successfully passed. And when the tree structure gets deeper, I can adjust it with the numbers. – Malama Apr 19 '20 at 19:43