2

I'm looking for a regular expression to exclude the URLs from an extension I don't like.

For example resources ending with: .css, .js, .font, .png, .jpg etc. should be excluded.

However, I can put all resources to the same folder and try to exclude URLs to this folder, like:

.*\/(?!content\/media)\/.*

But that doesn't work! How can I improve this regex to match my criteria?

e.g.

Match:

http://www.myapp.com/xyzOranotherContextRoot/rest/user/get/123?some=par#/other

No match:

http://www.myapp.com/xyzOranotherContextRoot/content/media/css/main.css?7892843

moffeltje
  • 4,521
  • 4
  • 33
  • 57
Mark
  • 17,887
  • 13
  • 66
  • 93

2 Answers2

1

The correct solution is:

^((?!\/content\/media\/).)*$

see: https://regex101.com/r/bD0iD9/4

Inspirit by Regular expression to match a line that doesn't contain a word?

Community
  • 1
  • 1
Mark
  • 17,887
  • 13
  • 66
  • 93
0

Two things:

First, the ?! negative lookahead doesn't remove any characters from the input. Add [^\/]+ before the trailing slash. Right now it is trying to match two consecutive slashes. For example:

.*\/(?!content\/media)[^\/]+\/.*

(edit) Second, the .*s at the beginning and end match too much. Try tightening those up, or adding more detail to content\/media. As it stands, content/media can be swallowed by one of the .*s and never be checked against the lookahead.

Suggestions:

  1. Use your original idea - test against the extensions: ^.*\.(?!css|js|font|png|jpeg)[a-z0-9]+$ (with case insensitive).
  2. Instead of using the regular expression to do this, use a regex that will pull any URL (e.g., https?:\/\/\S\+, perhaps?) and then test each one you find with String.indexOf: if(candidateURL.indexOf('content/media')==-1) { /*do something with the OK URL */ }
cxw
  • 16,685
  • 2
  • 45
  • 81
  • Can you explain you answer with the concrete reuglar expression? – Mark Jun 25 '15 at 09:53
  • Edited answer. See https://regex101.com/r/sV3uO3/2 - I added `()` around each piece so you can see the matches on the right side under "match information." The initial `.*` swallows the `content/media` so the lookahead never has the opportunity to check it. – cxw Jun 25 '15 at 10:46