2

I'm able to match a part of the URL string to the specified format. This looks like this:

/foo/firstName[firstId]/optional1[oId1]/optional2[oId2]/…

where name[id] need to be together (like not just name or [id]) but only firstName[firstId] is mandatory and there can be up to six optional entries.

I currently match such an entry with

/(?:([^\[]+)\[([^\]]+)\])/

which works well standalone, as you can see in this example but not with the complete URL (example). In the second version, the intermediate matches are overwritten and only the last one stays.

How can I solve that problem?

EDIT:

Because I don't know if the dear reader understands my question, I created a simplified version.

Given the Strings:

/foo/x/y/z
/foo/x/y
/foo/x

I now want to match all three of them and to get back x, y and z.

I can use

^\/foo(?:\/(\w))+

to match the whole string. But I get only z (or only y or only x respectively) back. How do I change that?

EXAMPLE

koehr
  • 769
  • 1
  • 8
  • 20
  • 1
    Remove the last `+` (like you already did in the first example). Your second example is broken; the regex is different from the first, and according to 'Your regular expression explained', the 'g' flag is not set. – Ruud Helderman Jan 14 '14 at 12:49
  • Thanks Ruud, but if I use the regex of the first example it doesn't work at all. So I will update the first example but the state of my question doesn't change. – koehr Jan 14 '14 at 12:54

3 Answers3

2

In your example you just need to specify the number of occurrences you want

^\/foo\/(?:([^\[]+)\[([^\]]+)\]){1,7}$

So, we put {1,7} which means minimum of 1 (the first required parameter name) and 6 optionals. see regex101 test


Going beyond, if you want to match with an URL I think we can improve a little including this:

^(?:(?:(?:http\:\/\/)?(?:www\.)?)[^\/]*){0,1}\/foo\/(?:([^\[]+)\[([^\]]+)\])+

which will put your rule into a real url, e.g. www.stackoverflow.com/foo/firstName[firstId]/optional1[oId1]/optional2[oId2]/

edit:

If you want to agroup all the parameters, so its going to be a little manually:

^(?:(?:(?:http\:\/\/)?(?:www\.)?)[^\/]*){0,1}(?#parameter1)\/(foo)(?#parameter2)(?:\/([^\/]*\[[^\/]*\]))(?#optional1)(?:\/([^\/]*\[[^\/]*\]))?(?#optinal2)(?:\/([^\/]*\[[^\/]*\]))?(?#optional3)(?:\/([^\/]*\[[^\/]*\]))?(?#optional4)(?:\/([^\/]*\[[^\/]*\]))?(?#optional5)(?:\/([^\/]*\[[^\/]*\]))?(?#optional6)(?:\/([^\/]*\[[^\/]*\])\/?)?$

see regex101 test

It is not possible to do undeterminated number of groups capturing-a-pattern-of-unknown-repitition-in-pcre

Community
  • 1
  • 1
Caio Oliveira
  • 1,243
  • 13
  • 22
  • I know how to use that but that not answers my problem because the intermediate match groups are still not returned. It only returns the last result of all matches. – koehr Jan 14 '14 at 13:01
  • Thanks Caio. Thats not a pleasant answer but it helps alot! – koehr Jan 14 '14 at 13:27
1

The intermediate matches are overwritten and only the last one stays.

A capturing group might match multiple times, but it has only one result (in JavaScript) - you're accessing the last match. Only different regex engines allow to access intermediate matches. See also http://www.regular-expressions.info/captureall.html.

How do I change that?

You cannot modify the regex to give multiple results other than by adding more explicit capturing groups: ^\/foo(?:\/(\w))?(?:\/(\w))?(?:\/(\w))?(?:\/(\w))?…. Ugly.

The easiest solution is probably to get all the parts as one long string, and then split them by the slash into an array:

> str.match(/^\/foo\/(?:[^\[]+\[[^\]]+\])+/gi)[0].split("/").slice(1);
["foo", "firstName[firstId]", "optional1[oId1]", "optional2[oId2]"]

If the expression is more complicated (and you have multiple capturing groups that are repeated), then you can repeatedly exec a regex for a single part on string and grab the groups in each iteration:

> var regex = /(?:([^\/\[]+)\[([^\]]+)\])/g, match;
> while (match = regex.exec(str)) console.log(match[1], match[2]);
firstName, firstId
optional1, oId1
optional2, oId2
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
0

Ruud's comment appears to be the ticket: http://regex101.com/r/zQ0uO3

It doesn't eliminate the initial http folder "/foo", but you can make a small mod and/or parse it out via javascript. It looks like a pretty good solution.

NetsydeMiro
  • 171
  • 4