1

This code outputs the $captured array, but $captured[1] contains bar/this rather than my expected bar. What's missing in my regex to stop from returning more than bar?

<?php

    $pattern = '/foo/:any/';
    $subject = '/foo/bar/this/that';

    $pattern = str_replace(':any', '(.+)', $pattern);
    $pattern = str_replace(':num', '([0-9]+)', $pattern);
    $pattern = str_replace(':alpha', '([A-Za-z]+)', $pattern);

    echo '<pre>';

    $pattern = '#^' . $pattern . '#';
    preg_match($pattern, $subject, $captured);

    print_r($captured);
    echo '</pre>';
Matthew
  • 15,282
  • 27
  • 88
  • 123
  • 2
    I fail to see how your regex pattern could work at all. The string replacements will turn it into `/foo/(.+)/`, and now you've got a `/` pattern delimiter INSIDE your pattern without being escaped. At mininum it should look like `/foo\/(.+)/` to make it a valid regex. – Marc B Nov 11 '11 at 22:33
  • Ah sorry I also had `$pattern = '#^' . $pattern . '#';` in there, forgot to add it. – Matthew Nov 11 '11 at 22:37

3 Answers3

5

Use a non-greedy modifier to make the + match as few characters as possible instead of as many as possible:

$pattern = str_replace(':any', '(.+?)', $pattern);
                                   ^

You probably also want to add delimiters round your regular expression and anchor it to the start of the string:

$pattern = '#^/foo/:any/#';
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Wouldn't that be the same as `$pattern = str_replace(':any', '(.*)', $pattern); ` – mowwwalker Nov 11 '11 at 22:30
  • 1
    @Walkerneo: No. The asterisk is still greedy (but it allows zero repetitions whereas `+` needs at least one). – Tim Pietzcker Nov 11 '11 at 22:33
  • @Walkerneo no. `+` is a greedy search and will take all found characters until the last `/` is found. The `?` makes the search non-greedy, and stops at the first occurence of `/`. – Nahydrin Nov 11 '11 at 22:34
  • @Mark: Isn't it strange that he didn't get `bar/this/that` in `$captured[1]`? – Tim Pietzcker Nov 11 '11 at 22:34
  • @TimPietzcker no, his `$subject` doesn't end in a `/`, so it wouldn't match the last word. – Nahydrin Nov 11 '11 at 22:35
  • 1
    @BrianGraham: But the slashes in his pattern are taken as delimiters. Actually, now that I'm looking at it, he should be getting an "invalid mode modifier" error because `foo` would be his pattern and `(.+)/` would be interpreted as mode modifiers...Ah, now he's commented his question explaining what he left out of the question... – Tim Pietzcker Nov 11 '11 at 22:37
1

The dot is greedy and matches as many characters as possible. Either make it lazy:

$pattern = str_replace(':any', '(.+?)', $pattern);

or keep it from matching slashes:

$pattern = str_replace(':any', '([^\/]+)', $pattern);
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

Your code is rather confusing and misleading and if run it, it outputs a warning:

Warning: preg_match(): Unknown modifier '(' in php shell code on line 1

What I think is wrong is:

$pattern = '/foo/:any/';
#should be
$pattern = '/foo\/:any/';

because you need to escape a forward slash in regexp.

After this is fixed the script returns:

(
  [0] => foo/bar/this/that
  [1] => bar/this/that
)

Which is an expected result. As you match foo/ and everything afterwards with (.*). If you want to match anything until the next forward slash you have some possibilities:

$pattern = '/foo/(.*?)/'     #non greedy
$pattern = '/foo/([^\/]*)/'  #not matching any forward slash
$pattern = '@foo/:any/@'     #or using different start and end markers, e.g. @
topek
  • 18,609
  • 3
  • 35
  • 43
  • I had different delimiters that I forgot to include in the original post (now edited) like so: `$pattern = '#^' . $pattern . '#';`, that way I wouldn't have to escape forward slashes. – Matthew Nov 11 '11 at 23:16