Regex for string containing one string, but not another

Question

Have regex in our project that matches any url that contains the string "/pdf/":

(.+)/pdf/.+

Need to modify it so that it won't match urls that also contain "help"

Example:

Shouldn't match: "/dealer/help/us/en/pdf/simple.pdf" Should match: "/dealer/us/en/pdf/simple.pdf"

What language, what style of regex, what code is being used to match? — J Earls, Sep 06 '16 at 17:00
Can "help" occur after "/pdf/"? If so, should it match "/dealer/us/en/pdf/help.pdf"? — Andrew Morton, Sep 06 '16 at 17:22
What tool are you using to match the regular expression? Are you using grep for example? Is it a programming language? — ffledgling, Sep 06 '16 at 18:42

score 3 · Accepted Answer · answered Sep 06 '16 at 19:03

3

If lookarounds are supported, this is very easy to achieve:

(?=.*/pdf/)(?!.*help)(.+)

See a demo on regex101.com.

answered Sep 06 '16 at 19:03

Jan

42,290
8
54
79

TemporalWolf · Answer 2 · 2016-09-06T18:37:58.933

2

(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)

First thing is match either a space or the start of a line

(?:^|\s)

Then we match anything that is not a or h OR any h that does not have elp behind it, one or more times +, until we find a /pdf/, then match non-space characters \S any number of times *.

((?:[^h ]|h(?!elp))+\/pdf\/\S*)

If we want to detect help after the /pdf/, we can duplicate matching from the start.

((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)

Finally, we match a or end line/string ($)

(?:$|\s)

The full match will include leading/trailing spaces, and should be stripped. If you use capture group 1, you don't need to strip the ends.

Example on regex101

edited Sep 06 '16 at 18:37

answered Sep 06 '16 at 17:29

TemporalWolf

7,727
1
30
50

This is very complicated and can be achieved **far** easier :) – Jan Sep 06 '16 at 19:03
@Jan This gives an immediately usable capture group, instead of matching a whole line. Also, because of this, it can match urls embedded in text, or just a list of urls not separated by line feeds. – TemporalWolf Sep 06 '16 at 19:52

Regex for string containing one string, but not another

2 Answers2