7

I'm trying to add comments to make a regexp clearer

// Strip any URLs (such as embeds) taken from http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-match-url-with-or-without-http-www
$pattern =
    '(                               # First capturing group
            (http|https)             # Second capturing grout,matches wither http or https
        \:\/\/)?                     # End of first capturing group, matches :// exactly
        [                            # Match any char in the following list. the + after the closing bracke means greedy
            a-z                      # Any char between a and z
            A-Z                      # Any char between A and Z
            0-9                      # Any char between 0 and 9
            \.\/\?\:@\-              # ./?:@- literally ( any one of them )
            _=#                      # _=# any of these thre chars
        ]+                           # end of list
        \.                           # matches .
        (                            # third caturing group
            [                        # start of list
                a-z                  # Any char between a and z
                A-Z                  # Any char between A and Z
                0-9                  # Any char between 0 and 9
                \.\/\?\:@\-          # ./?:@- literally ( any one of them )
                _=#                  # _=# any of these thre chars
            ]                        # end of list
        )*                           # end of capturing group with greedy modifier';
$excerpt = preg_replace("/$pattern/x", '', $excerpt );

But i get the warning

Warning: preg_replace(): Unknown modifier '/' in on line 280

How should i comment it?

Nicola Peluchetti
  • 76,206
  • 31
  • 145
  • 192

3 Answers3

6

This may not be the cleanest approach, but you could enclose each section in quotes and concatenate them.

Something like this should work:

$pattern =
    '('.                             // First capturing group
        '(http|https)'.              // Second capturing grout,matches wither http or https
    '\:\/\/)?'.                      // End of first capturing group, matches :// exactly
    ...   

Alternatively I found this in PHP docs.

So I imagine that would work too, but you are using the x modifier and that should be working already.

If the PCRE_EXTENDED option is set, an unescaped # character outside a character class introduces a comment that continues up to the next newline character in the pattern.

This indicates all of you comments within a character set [...] are invalid.

Here is a working example for use with the PCRE_EXTENDED modifier:

$pattern = '
    (                              # First capturing group
        (http[s]?)                 # Second capturing grout,matches wither http or https
    \:\/\/)?                       # End of first capturing group, matches :// exactly
    [a-zA-Z0-9\.\/\?\:@\-_=#]+     # [List Comment Here]
    \.                             # matches .
    (                              # third caturing group
        [a-zA-Z0-9\.\/\?\:@\-_=#]  # [List Comment Here]
    )*                             # end of capturing group with greedy modifier
';
segFault
  • 3,887
  • 1
  • 19
  • 31
4

This was brought up in a comment on the php.net modifiers page.

To quote:

When adding comments with the /x modifier, don't use the pattern delimiter in the comments. It may not be ignored in the comments area.

In your example, one of your comments has the string :// embedded within it. Since PHP seems to not parse regex delimiters by taking into account flags, it sees this as a problem. The same can be seen with the below code:

echo preg_replace('/
a #Com/ment
/x', 'e', 'and');

Demo

You would need to either change your delimiter or escape the delimiter in comments.

Anonymous
  • 11,748
  • 6
  • 35
  • 57
0

While it has already been said that the problem in your snippet comes from using the pattern delimiter in your pattern comments, completely refactoring the pattern to implement D.R.Y. practices will make your regex much simpler to read and maintain.

  1. Use a delimiting character that will not be found inside your pattern -- this eliminates avoidable escaping.
  2. ((http|https)\:\/\/)? can be simplified to (?:https?://)? and still maintain its optional status in the pattern.
  3. Your alphanumeric character class plus a short list of symbols can be reduced to [\w./?:@=#-]+.

Code:

// strip urls
$pattern = <<<REGEX
~
(?:https?://)?  # optionally, case-insensitively match http or https followed by colon, forwardslash, forwardslash
[\w./?:@=#-]+   # greedily match one or more characters from this list: any letters, any number, underscore, dot, forwardslash, question mark, colon, ampersand, equals, hash, hyphen
\.              # match a dot
[\w./?:@=#-]*   # greedily match zero or more characters from this list: any letters, any number, underscore, dot, forwardslash, question mark, colon, ampersand, equals, hash, hyphen
~ix
REGEX;

$excerpt = preg_replace($pattern, '', $excerpt);

After cleaning up and removing all of the bloat from your pattern, it may actually become attractive to encapsulate all of the inline comments as a comment prior to declaring the pattern because this affords the ability to wrap long lines onto newlines without breaking your pattern.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136