1

This is my pattern regex:

"subcategory.html?.*id=(.*?)&.*title=(.+)?"

for below input

http://example.com/xyz/subcategory.html?id=3000080292&backTitle=Back&title=BabySale I want to capturebelow group

  • group one (id) : 3000080292
  • group two (title) : BabySale

For which it is working fine. The problem is I want to make second group i.e. value of title to be optional, so that even if title is not present, regex should match and get me value of group 1(id). But for input

http://example.com/xyz/subcategory.html?id=3000080292&backTitle=Back&

Regex match is failing even if group one is present. So my question is how to make second group optional here?

Mahendra Chhimwal
  • 1,810
  • 5
  • 21
  • 33
  • I'm sorry but if your use-case is retricted to parsing URLs, maybe you should see http://stackoverflow.com/questions/13592236/parse-a-uri-string-into-name-value-collection or even use one of the many libraries doing the exact same thing. Regex tends to be vulnerable when doing that kind of thing. – Jeremy Grand Feb 23 '17 at 12:32

3 Answers3

2

One of the possible ways is to use something like:

subcategory\.html\?.*id=(.*?)&(.*title=(.+)?)?
(.*title=(.+)?)? is optional now.

please see an example here.

As suggested by @Christian it is better to make .*title non capturing group and it won't be part of the result.

subcategory\.html\?.*id=(.*?)&(?:.*title=(.+)?)?
Anton Balaniuc
  • 10,889
  • 1
  • 35
  • 53
  • You may also want to make the group around the `title` bit non-capturing (`(?:)` instead of `()`), to avoid the regex engine saving the result in a variable (thus changing the index numbers of existing ones): `subcategory\.html\?.*id=(.*?)&(?:.*title=(.+)?)?` – Christian Feb 23 '17 at 11:43
  • but above regex is failing to match "http://example.com/xyz/subcategory.html?id=3000080292&backTitle=Back& " for group one ? – Mahendra Chhimwal Feb 23 '17 at 12:00
2

Maybe make the entire substring optional?

Try subcategory.html?.*id=(.*?)&.*(?:title=(.+)?)?

Also note that your (and my) regex might be matching too much. For example, the dot here should probably be escaped: subcategory\.html instead of subcategory.html or you will match subcategory€html, too. Your question mark says the l of html is optional; you are probably saved by the .* ("match anything"), that follows.

Last but not least, the final .* means that even this will match (which you probably don't want to match):

http://example.com/xyz/subcategory.html?id=3000080292&backTitle=Back&title=BabySale&Lorem Ipsum Sit Atem http://&%$

It's usually a bad idea to match .* as it will nearly always match too much. Consider using character classes instead of the dot, and to anchor he beginning (^) and end ($) of the string... :)

Christian
  • 6,070
  • 11
  • 53
  • 103
  • but above regex is failing to match http://example.com/xyz/subcategory.html?id=3000080292&backTitle=Back& for group one – Mahendra Chhimwal Feb 23 '17 at 12:01
  • When I try it, it fails to find anything for group two (title), most likely because the preceding `.*` is too greedy - making it non-greedy should fix the issue - although `backTitle` has a capital `T` and we are only looking for a lowercase one. This might do: `subcategory\.html\?.*id=(.*?)&(?:.*?title=(\w+)?)?` - provided title comes last, and you don't want `backTitle`... See https://regex101.com/r/H9OKlb/3 – Christian Feb 23 '17 at 12:09
  • NB: You are using Java, so make sure to properly escape the regex pattern! – Christian Feb 23 '17 at 12:15
1

If you know that parameter id comes before optional title then you can use this regex to capture id and optional title parameters:

subcategory\.html\?id=([^&]*)(?:.*&)?(?:title=([^&]*))?

RegEx Demo

In Java use this regex:

final String regex = "subcategory\\.html\\?id=([^&]*)(?:.*&)?(?:title=([^&]*))?";
anubhava
  • 761,203
  • 64
  • 569
  • 643