3

This may be a quick question for experienced regular expressionists, but I'm having trouble getting my match to execute correctly.

Suppose I had a string that looked like this:

http://aaa-bbbb-cc-ddddd-eee-.sub.dom

I would like to go capture all of the "aaa", "bbbb", "cc", and "ddddd" substrings, but I'm not sure how many there will be (e.g., having all triplets up through "zzz").

This is the regular expression I'm trying to use right now:

/http:\/\/(\w*?\-)+\.sub\.dom/

I wrote it this way because:

  1. I want to match substrings, but I want each to terminate when a - is parsed
  2. I want to capture one or more of these substrings

But it seems to only be saving the last match that it makes (in the above case, it would only match "eee-".

Is there a good way to capture all of the matched substrings?

More information: I'm using PHP's PCRE function preg_replace_callback. Thanks!

Ryan
  • 925
  • 2
  • 6
  • 25

2 Answers2

5

No, it is not possible to match an unknown number of capture groups.

If you try to repeat a capture group, it will always contain the last value captured.

Could you explain a bit more broadly what you're trying to do? Perhaps there is another simple way to do it (possibly without regular expressions).

Jeremy Stein
  • 19,171
  • 16
  • 68
  • 83
  • This is great information; I was driving myself insane. I'm trying to tranform the whole domain into a string containing only the subdomain, where the dashes are replaced by underscores. I can do it fairly simply with `str_replace`, but I was hoping for a one-pass solution with regex. – Ryan Aug 17 '11 at 17:19
2

If you want the items in the subdomain, and then all matches between the dashes... This should work:

$string = "http://aaa-bbbb-cc-ddddd-eee-.sub.dom";

preg_match("/^http:\/\/([\w-]+?)\..*$/i", $string, $match);

$parts = explode('-', $match[1]);

print_r($parts);

Short of that you will probably have to build a small parsing script to parse the string yourself if that doesn't do it for you.

sg3s
  • 9,411
  • 3
  • 36
  • 52
  • That'll end up doing what I want. Dang, I was hoping regex would work out for me here, but I guess we're limited here because what I'm asking for isn't finite-state in the worst case. – Ryan Aug 17 '11 at 17:20
  • 1
    Regexes can't capture a recurring pattern, it's the same problem as trying to parse html with a regex, it just doesn't work. – sg3s Aug 17 '11 at 17:22