-1

I am trying to configure a regex pattern to add a # at the end of a string after the second forward slash as follows:

  • /leisure/venuename/news => /leisure/venuename#/news
  • /leisure/venuename/page/384 => /leisure/venuename#/page/384

The code below

gsub(/^(.*)(\/.*)$/, '\1#\2')

works as expected for the first pattern, but for the second pattern, it provides this:

  • /leisure/venuename/page#/384

Is there a way to capture both groups with one pattern?

sawa
  • 165,429
  • 45
  • 277
  • 381
user3868832
  • 610
  • 2
  • 9
  • 25
  • @Taemyr (\/.*?\/[a-zA-Z0-9]+)(.*) http://regex101.com/r/tP2sW7/1 – vks Aug 11 '14 at 12:25
  • @vks /leisure/venuename_temp/page/384. Amal's answer is the correct way forward for this problem, if you are explicit in what you are looking for greedy/lazy does not matter. – Taemyr Aug 11 '14 at 12:36

3 Answers3

2

Instead of using .* to match the forward slashes, be a little more explicit:

^(\/[^\/]+\/[^\/]+)(\/.*)$

Visualization:

From Debuggex.com

Explanation:

^            # Assert position at the beginning of the line
(            # Begin first capturing group
    \/       #  Match literal '/'
    [^\/]+   #  Match any character that is not a '/', one or more times
    \/       #  Match literal '/'
    [^\/]+   #  Match any character that is not a '/', one or more times
)            # End of first capturing group
(            # Begin second capturing group
    \/       #  Match literal '/'
    .*       #  Match everything else
)            # End of second capturing group
$            # Assert position at the end of the line

RegEx Demo

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
  • thanks Amal, that worked a peach. Just a quick question, greedy means, in the simplest terms? – user3868832 Aug 11 '14 at 12:25
  • @user3868832: I'll just quote: "*The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex.*". See this question for more details: [What do lazy and greedy mean in the context of regular expressions?](http://stackoverflow.com/questions/2301285/) – Amal Murali Aug 11 '14 at 12:26
  • @user3868832 Greedy means that a quantifier, ie * or +, tries to match as much as possible before backtracking. The other option is lazy, specified by `*?` or `+?` in which case it matches as little as possible. – Taemyr Aug 11 '14 at 12:27
  • 1
    @user3868832 For example in your regexp the quantifier is greedy. So it first tries to match the whole input "/leisure/venuename/page/384" to the first `.*`. This fails because it needs the next character to be a "\". It then backtracs, trying "/leisure/venuename/page/383", which also fails. Eventually it attemts to match the first quatifier with "/leisure/venuename/page" and then the rest of the regexp suceeds, so it considers this a match. – Taemyr Aug 11 '14 at 12:30
  • @Taemyr thanks dude. I think this gives me something to go on. Regex is scary.. I just need to try and understand it programatically like you have explained up there. There should be more straightforward explanations like you and Amal have provided. – user3868832 Aug 11 '14 at 12:43
  • @user3868832: You may want to check out this [interactive regex tutorial](http://regexone.com/). – Amal Murali Aug 11 '14 at 12:46
0

You are doing two things wrong.

  1. You should not use gsub here.
  2. You should not use \2 here.

Do it like this:

"/leisure/venuename/news"
.sub(%r"((?:/[^/]*){2})", '\1#')
#=> "/leisure/venuename#/news"

"/leisure/venuename/page/384"
.sub(%r"((?:/[^/]*){2})", '\1#')
#=> "/leisure/venuename#/page/384"
sawa
  • 165,429
  • 45
  • 277
  • 381
0

While it's possible to do this with a regular expression, I'm not sure that it's the more readable/maintainable route. I'd do it like this:

foo = '/leisure/venuename/news'.split('/')
foo[2] << '#'
foo.join('/') # => "/leisure/venuename#/news"

foo = '/leisure/venuename/page/384'.split('/')
foo[2] << '#'
foo.join('/') # => "/leisure/venuename#/page/384"

It's simple, clean and very readable, which is important when the code is going to be around a while.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303