4

I am trying to understand how preg_match_all() works and when looking at the documentation on the php.net site, I see some examples but am baffled by the strings sent as the pattern parameter. Is there a really thorough, clear explanation out there? For example, I don't understand what the pattern in this example means:

preg_match_all("/\(?  (\d{3})?  \)?  (?(1)  [\-\s] ) \d{3}-\d{4}/x",
            "Call 555-1212 or 1-800-555-1212", $phones);

or this:

$html = "<b>bold text</b><a href=howdy.html>click me</a>";
preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

I've taken an introductory class on PHP, but never saw anything like this. Some clarification would be appreciated.

Thanks!

hakre
  • 193,403
  • 52
  • 435
  • 836
Kevin_TA
  • 4,575
  • 13
  • 48
  • 77
  • 3
    See http://regular-expressions.info/ for a better tutorial, and check out http://stackoverflow.com/questions/89718/is-there-anything-like-regexbuddy-in-the-open-source-world -and- http://stackoverflow.com/questions/32282/regex-testing-tools for some tools to visualize those expressions. – mario Jan 08 '12 at 22:48

4 Answers4

4

Those aren't "PHP patterns", those are Regular Expressions. Instead of trying to explain what has been explained before a thousand times in this answer, I'll point you to http://regular-expressions.info for information and tutorials.

deceze
  • 510,633
  • 85
  • 743
  • 889
3

You are looking for this,

  1. PHP PCRE Pattern Syntax
  2. PCRE Standard syntax

Note that first one is a subset of second one.

Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187
2

Also have a look at YAPE, which for example gives this nice textual explanation for your first regex:

(?x-ims:\(?  (\d{3})?  \)?  (?(1)  [\-\s] ) \d{3}-\d{4})

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  \(?                      '(' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \1 (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  )?                       end of \1 (NOTE: because you are using a
                           quantifier on this capture, only the LAST
                           repetition of the captured pattern will be
                           stored in \1)
----------------------------------------------------------------------
  \)?                      ')' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?(1)                    if back-reference \1 matched, then:
----------------------------------------------------------------------
    [\-\s]                   any character of: '\-', whitespace (\n,
                             \r, \t, \f, and " ")
----------------------------------------------------------------------
   |                        else:
----------------------------------------------------------------------
                             succeed
----------------------------------------------------------------------
  )                        end of conditional on \1
----------------------------------------------------------------------
  \d{3}                    digits (0-9) (3 times)
----------------------------------------------------------------------
  -                        '-'
----------------------------------------------------------------------
  \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
mario
  • 144,265
  • 20
  • 237
  • 291
  • Where exactly did you get that output from? I checked out the link you provided but it was just a Google search and I didn't really find something that may have produced this output. – Kevin_TA Jan 08 '12 at 23:00
  • Yes, it's that first link, the Perl module. I made myself a tiny shell script for that. It's just `perl -e " use YAPE::Regex::Explain; my \$re = qr{$1}$2; print YAPE::Regex::Explain->new(\$re)->explain(); "` -- But you can also just keep rewriting that small example script as seen on its [CPAN page](http://search.cpan.org/~gsullivan/YAPE-Regex-Explain-4.01/Explain.pm). – mario Jan 08 '12 at 23:03
  • That isn't right. The OP's regex uses the `/x` modifier, so the first node should be `(?x-ims:` and those pure whitespace nodes shouldn't be listed. But that list is incomplete anyway. According to [this bug report](https://rt.cpan.org/Public/Bug/Display.html?id=41497), the module hasn't been updated since Perl 5.6, and PCRE always supported a slightly different set of modifiers to begin with. – Alan Moore Jan 09 '12 at 04:09
  • @AlanMoore: True, updated with actually specifying `x`. It's only useful for illustrative purposes anyway. It seems to work with many PCRE patterns still, but obviously it's not the prettiest tool. – mario Jan 09 '12 at 04:40
1

The pattern you write about is a mini-language in it's own called Regular Expression. It's specialized on finding patterns in strings, do replacements etc. for everything that follows some sort of pattern.

More specifically it's a Perl Compatible Regular Expression (PCRE).

The handbook for that language is not available on the PHP manual website, you find it here: PCRE Manpage.

A well made step-by-step introduction is on the Regular Expressions Info Website.

hakre
  • 193,403
  • 52
  • 435
  • 836