2

I have a regex expression found here. Try out the strings below, the problem I'm facing is that there's an extra whitespace located at the beginning of each captured group after the 1st one. I need the whitespace to be matched but I don't need them to be captured.

Regex expression:

^(\/[a-zA-Z0-9]+)?(\s~[a-zA-Z]+)?([\w\s'()-]+)?((?:\s~[a-zA-Z]+){0,2})?$

Viewing it at the link above makes it much simpler to comprehend.

These are some strings you can paste into the test string area one by one:

/test ~example matches ~extra ~space
this too has an extra ~space ~matched
/like wise for this
/and ~this

Take a look at the match groups area and notice that after the 1st group, the 1 preceding whitespace between groups are captured.

What I want to do is this:

For the 1st and 2nd capture group, I want them to detect a succeeding space and absorb it but not capture it, so that the 3rd capture group won't detect and capture the extra space. For the 4th capture group, I want it to detect a preceding space and absorb it but not capture it.

What I mean by absorb is that the space gets "removed" in a sense that the 3rd capture group won't realize it's there.

How can I do this?

Thanks.

3 Answers3

1

This is the regex that I came up with-

^(\/[a-zA-Z0-9]+)?(?:\s)?(~[a-zA-Z]+)?(?:\s)?([\w\'()\-\s]+)?(?:\s(~[a-zA-Z]+))?(?:\s(~[a-zA-Z]+))?$

ELaborating the regex in 2 parts as per the requirement-

For the 1st and 2nd capture group, I want them to detect a succeeding space and absorb it but not capture it, so that the 3rd capture group won't detect and capture the extra space.

Your regex for the 1st and 2nd groups -

(\/[a-zA-Z0-9]+)?(\s~[a-zA-Z]+)?

So, after each first and second capturing group, I've added a non-capturing (?:\s)? .This allows the 3rd capturing group to not absorb the preceding space. This is my regex -

(\/[a-zA-Z0-9]+)?(?:\s)?(~[a-zA-Z]+)?(?:\s)?

For the 4th capture group, I want it to detect a preceding space and absorb it but not capture it.

Your regex

((?:\s~[a-zA-Z]+){0,2})?

Here, an obvious solution would be to capture only the text part([a-zA-Z]) and non-capture the \s part. Something like this,

(?:(?:\s(~[a-zA-Z]+)){0,2})?
         ^^^^^^^^^^ Capturing only this.

But this is a repeated capturing group, where effectively you are capturing a new element on top of the old element. Basically, A repeated capturing group will only capture the last iteration. So if you wanted to match-

" ~space ~matched", it will only capture the last "~matched".

So one solution would be that since you are checking it for {0,2}, you can explicitly check for it 2 times, like so -

(?:\s(~[a-zA-Z]+))?(?:\s(~[a-zA-Z]+))?

But if the requirement for {0,2} later changes then, the best solution would be to capture the preceding spaces and split the captured group by spaces separately.

->  OUTPUT - when I run this regex for the given strings in JavaScript-
["/test ~example matches ~extra ~space", "/test", "~example", "matches", "~extra", "~space", index: 0, input: "/test ~example matches ~extra ~space"] (index):18
["this too has an extra ~space ~matched", undefined, undefined, "this too has an extra", "~space", "~matched", index: 0, input: "this too has an extra ~space ~matched"] (index):18
["/like wise for this", "/like", undefined, "wise for this", undefined, undefined, index: 0, input: "/like wise for this"] (index):18
["/and ~this", "/and", "~this", undefined, undefined, undefined, index: 0, input: "/and ~this"] 

Hope this helped.

Kamehameha
  • 5,423
  • 1
  • 23
  • 28
  • 1
    FYI, the group in `(?:\s)?` isn't doing anything useful. `\s?` is all you need. – Alan Moore Feb 06 '14 at 08:50
  • This is brilliant and so far, does everything extremely well. I may change a few things around but by and large, this is exactly what I was looking for. Thanks so much for the explanation as well! Oh and in your regex for the 3rd capture, since you reordered some of the characters, you'd need to escape them with \, especially the dash. – user3201185 Feb 06 '14 at 08:54
0

I think this does what you want:

^(\/[a-zA-Z0-9]+)?(?:(\s~[a-zA-Z]+)\s)?([\w\s'()-]+)?(?:\s((?:~[a-zA-Z]+\s?){0,2}))?$
Thayne
  • 6,619
  • 2
  • 42
  • 67
0

Try this regex:

^(\/[a-zA-Z0-9]+)?\s?(~[a-zA-Z]+)?\s*([\w\s'()-]+)?\s?((?:~[a-zA-Z]+\s?){0,2})?$

Online Demo: http://regex101.com/r/rA5tR0

anubhava
  • 761,203
  • 64
  • 569
  • 643