0

I have following partial URL that can be

/it/xyz/test/param+1/param-2/1234/gfd4

Basically two letter at the beginning a slash another unknown string and then a series of repeatable strings between slashes I need to capture every string (I know a split with / delimiter would be fine but I am interested to know how can I extract with regex). I came out first with this:

^\/([a-zA-Z]{2})\/([a-zA-Z]{1,10})(\/[a-zA-Z1-9\+\-]+)

but it only capture

group1: it group2: xyz group3: /test

and of course it ignores the rest of the string.

If I add a * sign at the end it only captures the last sentence:

^\/([a-zA-Z]{2})\/([a-zA-Z]{1,10})(\/[a-zA-Z1-9\+\-]+)*

group1: it group2: xyz group3: /gfd4

So, I am obviously missing some fundamentals, so in addition to the proper regex I would like to have an explanation.

I tagged as Java because the engine which parses the regex is the JDK 7. It is my knowledge that each engine may have differences.

Leonardo
  • 9,607
  • 17
  • 49
  • 89

1 Answers1

0

As mentioned here, this is expected:

With one group in the pattern, you can only get one exact result in that group.
If your capture group gets repeated by the pattern (you used the + quantifier on the surrounding non-capturing group), only the last value that matches it gets stored.

I would rather capture the rest of the string in group3 ((\/.*$), as in this demo), then use a split around '/'. Or apply yhat pattern on the rest of the string:

Pattern p = Pattern.compile("(\/[a-zA-Z1-9\+\-]+)");
Matcher m = p.matcher(str);
while (m.find()) {
    String place = m.group(1);
    ...
}
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250