1

I trying to create preg_match function with a pattern to validate the future string with unlimit occurence. This is my function like this:

if(! preg_match_all("#^\([a-zA-Z0-9_-]+\)$#", $arg, $matches, PREG_OFFSET_CAPTURE)){
    var_dump($matches);
    throw new \Exception('The simple pattern "'.$arg.'" is not valid !');
}

One occurrence must respect the following format any charchters between two parentheses: (mystring123/). The whole of string ($arg) is a collection of these occurrences.
For example
1-This string is valid (AAA/)(BBB/)(cc).
2-this string is not valid (AAA/)xxxx(BBB/)(cc)

The function works correctly but the pattern that I trying to create not accept more than one occurrence.

My second try, I change the pattern but the issue has been triggered when preg_match function is executed.

#[^\([a-zA-Z0-9_-]+\)$]+#

My need is how to resolve this issue, and how I can add to pattern string the followin charchters "\" and "/".

isom
  • 304
  • 1
  • 13
  • Could you add an example of what is expected and what happens? It would be easier to understand. – Anthony Mar 12 '18 at 13:27
  • please print your $arg samples – Yassine CHABLI Mar 12 '18 at 13:28
  • This string is valid (AAA/)(BBB/)(cc) .But this string is not valid (AAA/)(BBB/)xxx(cc), so I want that all occurrences between two parentheses – isom Mar 12 '18 at 13:34
  • 1
    @isom I'm not sure how you've determined that either of those strings is valid because neither can be matched by your regex pattern. It might, instead, be easier for you to tell us what you're trying to accomplish and present us with sample strings. – ctwheels Mar 12 '18 at 13:40

3 Answers3

2

I've toiled at this task for a period of time, trying to devise a method to combine your fullstring validation with indefinite captured groups. After trying many combinations of \G and lookarounds, I am afraid it cannot be done in one pass. If php allowed variable width lookbehinds, I think I could, but alas they are not available.

What I can offer is a process with the unnecessary "stuff" removed.

Code: (Demo)

$strings = ["(AAA/)(BBB/)(cc)", "(AAA/)xxxx(BBB/)(cc)"];

foreach ($strings as $string) {
    if (!preg_match('~^(?:\([\w\\/-]+\))+$~', $string)) {
        echo "The simple pattern $string is not valid!";
        // throw new \Exception("The simple pattern $string is not valid!");
    } else {
        var_export(preg_split('~\)\K~', $string, 0, PREG_SPLIT_NO_EMPTY));
    }
    echo "\n";
}

Output:

array (
  0 => '(AAA/)',
  1 => '(BBB/)',
  2 => '(cc)',
)
The simple pattern (AAA/)xxxx(BBB/)(cc) is not valid!

Pattern #1 Breakdown:

~              #pattern delimiter
^              #start of string anchor
(?:            #start of non-capturing group
  \(           #match one opening parenthesis
  [\w\\/-]+    #greedily match one or more of the following characters: a-z, A-Z, 0-9, underscores, backslashes, slashes, and hyphens
  \)           #match one closing parenthesis
)              #end of non-capturing group
+              #allow one or more occurrences of the non-capturing group
$              #end of string anchor
~              #pattern delimiter

Pattern #2 Breakdown:

~              #pattern delimiter
\)             #match one closing parenthesis
\K             #restart the fullstring match (forget/release previously matched character(s))
~              #pattern delimiter

Pattern #2's effect is to locate every closing parenthesis and "explode" the string on the zero width position that follows the closing parenthesis. \K ensures that no characters become casualties in the explosions.

The if condition does not need to call preg_match_all() since there can only ever be one matching string while you are validating from ^ to $. Declaring a variable to contain the "match" is pointless ( as is PREG_OFFSET_CAPTURE) -- if there is a match, it will be the entire input string so just use that value if you want it.

preg_split() is a suitable substitute for a preg_match_all() call because it outputs exactly the output that you will seek in a lean single-dimensional array AND uses a very small, readable pattern. *The 3rd and 4th parameters: 0 and PREG_SPLIT_NO_EMPTY tell the function respectively that there is "no limit" to the number of explosions, and that any empty elements should be discarded (don't make an empty element from the ) that trails cc)

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • thank you very much for this work it's great (pattern#1) :) , but tell me what is the exactly impact of (?:) because when I removed from the pattern's string, I still receive the same result. – isom Mar 13 '18 at 08:13
  • The non-capturing group permits the repetition of the substring inside of it one or more times. If you remove it, you will chance the meaning/accuracy of the pattern. If you need more clarity than this, please make a demo link for me and I'll explain differently. – mickmackusa Mar 13 '18 at 08:40
  • :-) That regex looks familiar +1 – The fourth bird Mar 13 '18 at 09:16
  • @isom Now I see what you mean. You are replacing the non-capturing group with a capturing group. That will not damage the accuracy of the pattern. I am using (out of habit) a non-capturing group because when you DO have a declared output variable -- the capture group will create an array with twice the side. (fullstring match and captured group) Because there is no matches array declared in my `preg_match()` call, there is no difference between our two patterns. If you want to remove the two characters (`?:`) that's absolutely okay. – mickmackusa Mar 13 '18 at 10:44
1

If I am not mistaken your $arg could be a string from which (AAA/)(BBB/)(cc) is valid and (AAA/)xxxx(BBB/)(cc) is invalid.

If that is the case and you want to match occurrences of your accepted characters in a character class you could group your characters and parenthesis and then repeat that as a non capturing group.

Your current character class [a-zA-Z0-9_-] does not contain a forward slash so you could add that to match an occurrence like (AAA/). You could also add the backslash. This page has a good explanation about escaping a backslash.

You could update your regex to:

^(?:\([/a-zA-Z0-9_\\-]+\))+$

Or use \w to match a word character which matches [a-zA-Z0-9_]. This would look like [/\w\\-]+

That would match

  • ^ Beginning of the string
  • (?: Non capturing group
    • \( Match (
    • [/a-zA-Z0-9_\\-]+ Your allowed characters in a character set repeated one or more times
    • \) Match )
  • )+ Close non capturing group and repeat one or more times
  • $ The end of the string

Your code could look like:

if(! preg_match_all("#^(?:\([/a-zA-Z0-9_\\\\-]+\))+$#", $arg, $matches, PREG_OFFSET_CAPTURE)){
    var_dump($matches);
    throw new \Exception('The simple pattern "'.$arg.'" is not valid !');
}

Demo php

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
-1

The usage of \ before a character escapes it, so it will be searched. If you look for / just make a pattern like \/. If you look for \, try this one: \\. So \\\/\.\/\\ will find \/./\. Usually you start and end your search pattern with a / in php. Like /[a-zA-Z]\./

To try out some new regexes try this site: https://regex101.com/

It will explain every character you entered and also shows if it works only for one or more samples.

Lithilion
  • 1,097
  • 2
  • 11
  • 26