-2

I'm new to regex. I'm using regex to match urls, some of the result contains =. I want only those url match my pattern but not contains =.

My pattern is \S+google.com\/\S+-\S+

For example:

  1. MATCH: www.google.com/aa-bb

  2. NOT MATCH: google.com/

  3. NOT MATCH: www.google.com/aa-bb=cc

My current pattern matches 1 and 3, but I want 1 ONLY

With this answer, Regular expression to match a line that doesn't contain a word?, I have tried (?=((?!=).)*)(?=\S+google.com\/\S+-\S+), trying to intersect the result of both match. But it seems regex does not work this way.

Python Regex answers only, please. Thanks!

shaoooo
  • 1
  • 1
  • 2

1 Answers1

3

Change \S to [^\s=] so it doesn't match spaces or =.

You should also anchor the pattern with ^ and $, so it has to match the entire URL. Otherwise it will match the www.google.com/aa-bb part of www.google.com/aa-bb=cc.

^\S+google\.com\/[^\s=]+-[^\s=]+$

You should also escape literal . in the regexp.

Barmar
  • 741,623
  • 53
  • 500
  • 612