5

Have regex in our project that matches any url that contains the string "/pdf/":

(.+)/pdf/.+

Need to modify it so that it won't match urls that also contain "help"

Example:

Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf" Should match: "/dealer/us/en/pdf/simple.pdf"

Jacob Petersen
  • 1,463
  • 1
  • 9
  • 17

2 Answers2

3

If lookarounds are supported, this is very easy to achieve:

(?=.*/pdf/)(?!.*help)(.+)

See a demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
2
(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)

First thing is match either a space or the start of a line

(?:^|\s)

Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.

((?:[^h ]|h(?!elp))+\/pdf\/\S*)

If we want to detect help after the /pdf/, we can duplicate matching from the start.

((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)

Finally, we match a or end line/string ($)

(?:$|\s)

The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.

Example on regex101

TemporalWolf
  • 7,727
  • 1
  • 30
  • 50
  • This is very complicated and can be achieved **far** easier :) – Jan Sep 06 '16 at 19:03
  • @Jan This gives an immediately usable capture group, instead of matching a whole line. Also, because of this, it can match urls embedded in text, or just a list of urls not separated by line feeds. – TemporalWolf Sep 06 '16 at 19:52