0

I am trying to match only text that do not end with a special string.

\/.*\/.*(?!\.htm$)

This should match:

/blabla/test

But not:

/bla/blabla.htm

I am using a negative lookahead but it does not work as expected. How can I make sure strings ending with .htm will not be matched?

merlin
  • 2,717
  • 3
  • 29
  • 59
  • If you're manipulating URLs, your host language probably has functions built in that will do it for you. No point in rewriting the wheel with regexes. – Andy Lester May 15 '20 at 15:14

2 Answers2

0

You can find the answer here: Regex for string not ending with given suffix

in your example will look like this

\/.*\/.*(?<!.htm)$

this is a test of this regex: https://regex101.com/r/QFtwEA/2

this is also work

^(?!.*\.htm$)\/.*\/.*$

this is a test of this regex: https://regex101.com/r/yN4tJ6/354

maeema
  • 46
  • 6
0

The pattern you tried will match in both cases because the .* will first match until the end of the string. The negative lookahead is an assertion and is non consuming which in this case asserts that, while being at then end of the string, makes sure that that there is not .htm at the right.

That is true, as it is at the end of the string.

If 2 consecutive slashes can not occur, but there must be 2 slashes, you could use

^/[^/\r\n]+/(?!.*\.htm$)[^/\r\n]+$

Explanation

  • ^ Start of string
  • /[^/\r\n]+/ Match /, then any char except a newline or /, then match / again
  • (?!.*\.htm$) Negative lookahead, assert that the string does not end with .htm
  • [^/\r\n]+ Match 1+ times any char except a newline or /
  • $ End of string

Regex demo

If the forward slash can occur multiple times:

^/(?:[^/\r\n]+/)*(?!.*\.htm$)[^/\r\n]+$

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70