1

I had a requirement of parsing a set of urls and extract specific elements from urls under special conditions. To explain it further, consider a set of urls:

http://www.example.com/appName1/some/extra/parts/keyword/rest/of/the/url http://www.somewebsite.com/appName2/some/extra/parts/keyword/rest/of/the/url http://www.someothersite.com/appname3/rest/of/the/url

As you can see, there are two sets of urls, one having the word "keyword" in it and others which don't. In my code, I will receive the part of the url after domain name (eg: /appName1/some/extra/parts/keyword/rest/of/the/url).

I have two tasks, one check if the word "keyword" is present in the url, and second, to be done only if "keyword" is not present in url, parse the url to fetch the two groups as the appName and rest of the url (eg: grp 1. appName3 and grp 2. rest/of/the/url for url 3, as it doesn't have "keyword" in it). The whole thing should be done in one regex.

My progress:

  • I was able to parse the app name and rest of the url into groups, but was not able to apply the condition.

  • I found out a way to select stings not having "keyword" in it, I'm not sure if it's the right way to do it:^((?!.\*keyword).\*)$

  • Next, to combine the above two, I tried something I found after a long search, which has syntax (?(?=regex)then|else) Reference. And the result was :
    (?(?=^((?!.*keyword).*)$)\1)
    But it says invalid group structure.

I had gone through many stackoverflow entries and tutorials, but couldn't reach the actual requirement. Please help me solve this.

Community
  • 1
  • 1
T90
  • 567
  • 6
  • 27

1 Answers1

1

Yes, this is in fact possible. As far as I understand, you have the following cases:

  • /appName/some/extra/parts/keyword/rest/of/the/url
  • /appName/rest/of/the/url

You want your regex to not match the first one at all, while in the second case you want "appName" in one group and "rest/of/the/url" in another. The following regex will do that:

^(?!.*\/keyword\/)\/(.*?)\/(.*)$

Explanation:

  • ^ assert position at the start of the string`
  • (?!.*\/keyword\/) is a negative lookahead, and looks ahead to make sure the string does not contain /keyword/. This is where the magic happens
  • \/ matches "/", i.e. the slash right after the domain name
  • (.*?)\/ captures the first group (appname in your example) greedily until next slash
  • (.*)$ is the group that captures "rest/of/the/url"
Mathias-S
  • 787
  • 3
  • 9
  • Hi @Mathias-S, I tried this, but it seems it returns groups even when there is "keyword" in it. I'm not sure if the requirement was clear. If "keyword" is present in the url, it shouldn't return any groups. – T90 Sep 24 '16 at 19:51
  • So if keyword is present, you want to get the whole URL, if it's not present, you want the groups? Or do you want nothing at all if keyword is present? – Mathias-S Sep 24 '16 at 19:53
  • if keyword is present, I don't want anything and if it's not there, the groups – T90 Sep 24 '16 at 19:58
  • 1
    Then you only need to use a negative lookahead. I've updated my answer with a regex that does that. Does this solve your issue? – Mathias-S Sep 24 '16 at 20:08