3

I have this regular expression:

([http://some.url.com/index.php?showtopic=\"]*)([0-9]+(?:\.[0-9]*)?)

its for extracting links to topics from forum

Now when i use it in my script

$url = "([http://some.url.com/index.php?showtopic=\"]*)([0-9]+(?:\.[0-9]*)?)";

preg_match_all spits: "Unknown modifier '('"

This is also the call to preg_match

preg_match_all($url, $str, $matches,PREG_OFFSET_CAPTURE,3);

Can anyone help me with this obviously stupid problem

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Anonymous
  • 63
  • 1
  • 4
  • 2
    That's an odd regular expression. It'll match "nice.rum.is.nice.1234" just as easily as it'll match that URL. Are you sure you want to be using a character class? – Samir Talwar May 22 '10 at 15:07

2 Answers2

6

You need to wrap your regular expression in delimiters. Any character that isn't a special PCRE metacharacter will do, so I'll use #:

$url = "#([http://some.url.com/index.php?showtopic=\"]*)([0-9]+(?:\.[0-9]*)?)#";

You can learn more about delimiters in the PHP manual section for PCRE delimiters.

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
  • thanks guys but both of the examples shows an error: unexpected '}', expecting '(' Which is in my string to match, so what is wrong with this one, why is taking content of the string into account? – Anonymous May 22 '10 at 14:52
  • Could you edit your question to include the code containing the preg_match_all() call and the delimited regex? – BoltClock May 22 '10 at 14:54
3

PCRE requires delimiters that separate the actual regular expression from optional modifiers. With PHP you can use any non-alphanumeric, non-backslash, non-whitespace character and even delimiters that come in pairs (brackets).

In your case the leading ( is used as delimiter and the first corresponding closing ) marks the end of the regular expression; the rest is treated as modifiers:

([http://some.url.com/index.php?showtopic=\"]*)([0-9]+(?:\.[0-9]*)?)
^                                             ^

But the first character after the ending delimiter (() is not a valid modifier. That why the error message says Unknown modifier '('.

In most cases / is used as delimiter like in Perl. But that would require to escape each occurrence of / in the regular expression. So it’s a good choice to choose a delimiter that’s not in the regular expression. In your case you could use # like BoltClock suggested.

Oh, and by the way: A character class like [http://some.url.com/index.php?showtopic=\"] represents just one single character of the listed characters. So either h, t, p, :, /, etc. If you mean to express http://some.url.com/index.php?showtopic=" literally, use just http://some\.url\.com/index\.php\?showtopic=" (don’t forget to escape the meta characters).

Gumbo
  • 643,351
  • 109
  • 780
  • 844