1

I'm using a regex negative lookahead in JavaScript to replace the last occurrence of a string within a string using regex matching.

Heres a snippet of my code:

var str = 'abc abc abc'
var regex1 = /abc(?!.*?abc)/
var regex2 = /abc(?!.*abc)/

var ematch1 = regex1.exec(str);
var ematch2 = regex2.exec(str);

console.log(ematch1, ematch1.index);
console.log(ematch2, ematch2.index);

Both these regexes - regex1 and regex2 - are getting identical results. Which is preferred and why? Or is a totally different approach better?

danday74
  • 52,471
  • 49
  • 232
  • 283
  • 1
    Depends on what you're going to do. Maybe you can use something like [`^(.*)abc`](https://regex101.com/r/KXeF5U/2) with a consuming `.*` before, which is usually most efficient in such cases. – bobble bubble Mar 19 '18 at 21:32
  • 1
    If you don't know where are the "abc" in your string, the two assertions are equivalent in speed. But if you know that "most of the time" the "abc" is near an other "abc", choose the non-greedy version, otherwise choose the greedy version. Note that if you have a long string and your pattern starts with a literal character, find all the matches (with the `match` method and the g modifier and without the lookahead) and take the last. – Casimir et Hippolyte Mar 19 '18 at 21:34

1 Answers1

2

You got two valid comments from two educated and active people in but I'm going to provide some additions. A greedy token .* results in backtracking steps most of time, it swallows every thing up to end (or in fact up to first newline character) then steps backward. True definition of .* shouldn't be zero or more but all, something or nothing.

So if abc happens near the end of input string .* satisfies engine earlier than a non-greedy quantifier .*? otherwise engine backtracks till it gets a chance to match abc or nothing in the worst case.

Having said that, number of backtracking steps is equal to length of input string. Conversely, if abc is known to be happen near beginning of input string, particularly on large data, .*? causes an earlier match than .*.

Additionally, it doesn't make a backtrack on same path because of its forward-looking behavior.

You sometimes may find language methods - other than sticking with Regular Expressions - faster and more helpful like lastIndexOf() in JS.

revo
  • 47,783
  • 14
  • 74
  • 117