3

I'm confused by a difference I found between the way JavaScript and PHP handle the following regex.

In JavaScript,

'foobar'.replace(/(?=(bar))/     , '$1');
'foobar'.replace(/(?=(bar))?/    , '$1');
'foobar'.replace(/(?:(?=(bar)))?/, '$1');

results in, respectively,

foobarbar
foobar
foobar

as shown in this jsFiddle.

However, in PHP,

echo preg_replace('/(?=(bar))/', '$1', "foobar<br/>");
echo preg_replace('/(?=(bar))?/', '$1', "foobar<br/>");
echo preg_replace('/(?:(?=(bar)))?/', '$1', "foobar<br/>");

results in,

foobarbar

Warning: preg_replace() [function.preg-replace]: Compilation failed: nothing to repeat at offset 9 in /homepages/26/d94605010/htdocs/lz/writecodeonline.com/php/index.php(201) : eval()'d code on line 2
foobarbar

I'm not so much worried about the warning. But it appears that in JavaScript, lookahead assertions are somehow "lazier" than in PHP. Why the difference? Is this a bug in one of the engines? Which is theoretically more "correct"?

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145

1 Answers1

2

The real difference is actually very simple:

In JavaScript, replace will only replace the first match, unless using the /g flag (global).
In PHP, preg_replace replaces all matches.

The third pattern, (?:(?=(bar)))?, can match the empty string in every position, and captures "bar" in some positions. Without the /g flag, it only matches once, at the beginning of the string.

You would have easily seen the difference had you used a more visible replacement string, like [$1].

PHP Example: http://ideone.com/8Mjg6
JavaScript Example, no /g: http://jsfiddle.net/qKb4b/3/
JavaScript Example, with /g: http://jsfiddle.net/qKb4b/2/

I would also note that "laziness" is a different concept in regular expressions, not related to this question.

Community
  • 1
  • 1
Kobi
  • 135,331
  • 41
  • 252
  • 292