1

I am trying to match the correct string using the negative lookahead regular expression.

I want my regex to accept Domain abcd[.]xyz, but not Bad URL h[xx]ps://abcd[.]xyz or Evil URL h[xx]p://stu[.]abc, I have tried many ways to achieve this, but its getting nowhere.

if (str.matches("^(\\w+\\s+)+(?!h\\S+p(s)?://)(.*)$")
{
   ...
}

The above code actually accepts all strings, which is incorrect. Anyone has a better eagle eye and tell me what I am missing? Thanks.

Trevor
  • 218
  • 3
  • 15
  • Why don't you just match the prefix "Domain" and any rest? – cyberbrain Aug 05 '23 at 11:46
  • Does this answer your question? [Regular expression to match a line that doesn't contain a word](https://stackoverflow.com/questions/406230/regular-expression-to-match-a-line-that-doesnt-contain-a-word) – InSync Aug 05 '23 at 12:02
  • Put the lookahead at the start of the regex: [`^(?!(\w+\s+)+h\S+ps?://)`](https://regex101.com/r/DH7Doq/2) – InSync Aug 05 '23 at 12:03
  • It's not clear why some of your example should match and others should not. Please post a variety of inputs and explain why they should or shouldn't match. – Bohemian Aug 05 '23 at 12:40

2 Answers2

2

This happens because after lookahead prevents pattern from matching backtracking occurs, and your first group gives back something, and tries to match again.

Look closely what is matched by first group in this example: https://regex101.com/r/GU7WV4/1

This can be worked around in a couple of ways:

  1. you can use possessive quantifiers. This will prevent backtracking from occurring.
^(\w+\s+)++(?!h\S+p(s)?://)(.*)$
  1. you can match only non-whitespace symbols for your (almost) URL:
^(\w+\s+)+(?!h\S+p(s)?://)(\S*)$
  1. You can use lookahead to prevent matching with a bit different approach:
^(?!.*h\S+ps?://\S+$)(\w+\s+)+(.*)$
markalex
  • 8,623
  • 2
  • 7
  • 32
  • Yes your 3 possible solutions work very well. It is exactly what I was looking for. Spot on. Thank you. – Trevor Aug 06 '23 at 01:04
0

You'll need a pattern to assert the starting text, i.e. "Domain", "Bad URL", and "Evil URL".

The following will assert that the starting text is not Bad URL, or Evil URL.

^(?!(?:Bad |Evil) URL).

Additionally, you could use the opposite, and assert only the valid matches.

^(?=Domain  ).
Reilas
  • 3,297
  • 2
  • 4
  • 17