67

I want to match all strings ending in ".htm" unless it ends in "foo.htm". I'm generally decent with regular expressions, but negative lookaheads have me stumped. Why doesn't this work?

/(?!foo)\.htm$/i.test("/foo.htm");  // returns true. I want false.

What should I be using instead? I think I need a "negative lookbehind" expression (if JavaScript supported such a thing, which I know it doesn't).

tchrist
  • 78,834
  • 30
  • 123
  • 180
gilly3
  • 87,962
  • 25
  • 144
  • 176
  • 3
    Unfortunately, JavaScript does not support "lookbehind" in regular expressions – Phil Jul 27 '11 at 22:19
  • 2
    It is often better to have a simpler regular expression with a loop or two, rather than a super monstrous (ok what you want isn't super monstrous, but code has a tendency to grow) need I say unmaintainable regular expression. – davin Jul 27 '11 at 22:31
  • Well this might not be timely, but to explain why this doesn't work: Your regexp is not a 0-width, what that means is that in javascript it translates to "Match '.htm' but not if it starts with 'foo'", since ".htm" will never start with "foo" this won't work. What the negative lookahead means is "at this point, exclude matches where this negative is true here", but it does not actually consume the string. – Eric Sep 18 '15 at 20:54

7 Answers7

97

The problem is pretty simple really. This will do it:

/^(?!.*foo\.htm$).*\.htm$/i.test("/foo.htm"); // returns false
Flimm
  • 136,138
  • 45
  • 251
  • 267
ridgerunner
  • 33,777
  • 5
  • 57
  • 69
  • 1
    +1. Not only is lookbehind not required, it would not be the best tool for this if it were available. – Alan Moore Jul 28 '11 at 01:34
  • Such a useful technique! – Josh Cole Jun 19 '17 at 17:09
  • 7
    Can you explain what is going on? I see that you have one start of line token (^) but two end of line tokens ($). How does this get the negative lookahead to work? – ericbowden Jul 04 '18 at 21:19
  • 1
    @ericbowden In case you're still wondering: it matches a start of the string that doesn't then match `.*foo\.htm` to the end of the string. Because lookaheads are not consumed, the second $ outside of it is actually the one that's matched. – IceMetalPunk Jul 12 '19 at 16:24
20

What you are describing (your intention) is a negative look-behind, and Javascript has no support for look-behinds.

Look-aheads look forward from the character at which they are placed — and you've placed it before the .. So, what you've got is actually saying "anything ending in .htm as long as the first three characters starting at that position (.ht) are not foo" which is always true.

Usually, the substitute for negative look-behinds is to match more than you need, and extract only the part you actually do need. This is hacky, and depending on your precise situation you can probably come up with something else, but something like this:

// Checks that the last 3 characters before the dot are not foo:
/(?!foo).{3}\.htm$/i.test("/foo.htm"); // returns false 
Nicole
  • 32,841
  • 11
  • 75
  • 101
  • 1
    You gave me enough to get myself the rest of the way there. This works for all of my test cases: `/(^.{0,2}|(?!foo).{3})\.htm$/i` – gilly3 Jul 27 '11 at 22:32
  • 5
    +1 Excellent explanation. However, `/(?!foo).{3}\.htm$/i` will fail to match a name having less than three chars, i.e. `a.htm`. Here's one that will get em all: `/^(?!.*foo\.htm$).*\.htm$/i` – ridgerunner Jul 27 '11 at 23:01
2

Probably this answer has arrived just a little bit later than necessary but I'll leave it here just in case someone will run into the same issue now (7 years, 6 months after this question was asked).

Now lookbehinds are included in ECMA2018 standard & supported at least in last version of Chrome. However, you might solve the puzzle with or without them.

A solution with negative lookahead:

let testString = `html.htm app.htm foo.tm foo.htm bar.js 1to3.htm _.js _.htm`;

testString.match(/\b(?!foo)[\w-.]+\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]

A solution with negative lookbehind:

testString.match(/\b[\w-.]+(?<!foo)\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]

A solution with (technically) positive lookahead:

testString.match(/\b(?=[^f])[\w-.]+\.htm\b/gi);
> (4) ["html.htm", "app.htm", "1to3.htm", "_.htm"]

etc.

All these RegExps tell JS engine the same thing in different ways, the message that they pass to JS engine is something like the following.

Please, find in this string all sequences of characters that are:

  • Separated from other text (like words);
  • Consist of one or more letter(s) of english alphabet, underscore(s), hyphen(s), dot(s) or digit(s);
  • End with ".htm";
  • Apart from that, the part of sequence before ".htm" could be anything but "foo".
Igor Bykov
  • 2,532
  • 3
  • 11
  • 19
2

As mentioned JavaScript does not support negative look-behind assertions.

But you could use a workaroud:

/(foo)?\.htm$/i.test("/foo.htm") && RegExp.$1 != "foo";

This will match everything that ends with .htm but it will store "foo" into RegExp.$1 if it matches foo.htm, so you can handle it separately.

Floern
  • 33,559
  • 24
  • 104
  • 119
  • MDN reports that [the `RegExp.$1` feature is non-standard](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/n). – alxndr Feb 23 '17 at 21:00
2

Like Renesis mentioned, "lookbehind" is not supported in JavaScript, so maybe just use two regexps in combination:

!/foo\.htm$/i.test(teststring) && /\.htm$/i.test(teststring)
Sled
  • 18,541
  • 27
  • 119
  • 168
petho
  • 677
  • 4
  • 10
1

String.prototype.endsWith (ES6)

console.log( /* !(not)endsWith */

    !"foo.html".endsWith("foo.htm"), // true
  !"barfoo.htm".endsWith("foo.htm"), // false (here you go)
     !"foo.htm".endsWith("foo.htm"), // false (here you go)
   !"test.html".endsWith("foo.htm"), // true
    !"test.htm".endsWith("foo.htm")  // true

);
Roko C. Buljan
  • 196,159
  • 39
  • 305
  • 313
0

You could emulate the negative lookbehind with something like /(.|..|.*[^f]..|.*f[^o].|.*fo[^o])\.htm$/, but a programmatic approach would be better.

ngn
  • 7,763
  • 6
  • 26
  • 35