14

Why does the following code:

<?php echo preg_replace("/(.*)/", "$1.def", "abc");

Output abc.def.def instead of abc.def?

I'm interested in understanding why the repetition occurs.

Using /(.+)/ or /^(.*)$/ works as expected, but I'm not looking for a solution, just asking a question (although these patterns may be related to the answer).

Tinker with a live version here.

Sam Starling
  • 5,298
  • 3
  • 35
  • 52
matb33
  • 2,820
  • 1
  • 19
  • 28

3 Answers3

8

Because .* matches the empty substring at the end of the string. It means there are two matches to the string abc:

  1. The whole string abcabc.def
  2. The empty string → .def

which gives abc.def.def.


Edit: Detail of why it happens is explained in String.replaceAll() anomaly with greedy quantifiers in regex.

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • 1
    Why does it not match the entire string the first time? If I run `"a".replace(/.*/, "b")` in javascript, I get the expected `b`. If I do `preg_replace("/.*/", "b", "a")` in php, I get `bb`. – Eric May 30 '12 at 16:09
  • 2
    @Eric: `"a".replace(/.*/g, "b")`. – kennytm May 30 '12 at 16:10
  • 1
    Strange expected behaviour. I would have thought that the `.*` match would consume as much as possible including any empty string at the end. Is there not an empty string after that empty string? :) – El Ronnoco May 30 '12 at 16:11
  • Whoops! I'd forgotten about that. – Eric May 30 '12 at 16:11
  • What is an 'Empty Substring'? – Mike B May 30 '12 at 16:12
  • is it specific to PCRE ? The ruby version `"a".sub(/.*/, 'b')` that uses another regex engine yields 'b' – SirDarius May 30 '12 at 16:16
  • 1
    @SirDarius: `"a".gsub(/.*/, 'b')` – kennytm May 30 '12 at 16:17
  • ah, my bad, so this is behaviour coming from at least two regex engines – SirDarius May 30 '12 at 16:19
  • Can someone confirm that there is only a single instance of an empty string at the end of the subject string? Because as many people have pointed out, why isn't there an empty string at the beginning? My understanding so far is that it's there on purpose as a sort of hook to match against. – matb33 May 30 '12 at 18:34
  • @matb33: The empty string does not happen in the beginning because the match is greedy. If it is non-greedy then it would match the empty string at the beginning, as pointed out in [@dda's answer](http://stackoverflow.com/a/10820150/224671). – kennytm May 30 '12 at 19:09
3

It's the expected behaviour: https://bugs.php.net/bug.php?id=53855

This is expected behaviour and nothing peculiar to PHP. The * quantifier allows an "empty" match to occur at the end of your subject string.

dAm2K
  • 9,923
  • 5
  • 44
  • 47
2

If you make your regex non-greedy, /(.*?)/ you can see the whole process of repetition working on a much larger/noticeable scale:

.defa.defb.defc.def

You get four matches: a, b, c, empty. Whereas, as other people mentioned, with a greedy regex, you get 2 matches, the full string, and an empty string.

dda
  • 6,030
  • 2
  • 25
  • 34
  • But why doesn't the greedy version grab the whole thing as a match - including the empty string? – trapper May 30 '12 at 16:20
  • Because when it starts matching something ('abc' in this case) it can't "add" an empty string to a non-empty string. Whereas once it's done with the non-empty string, it **can** match an empty string. – dda May 30 '12 at 16:45
  • Sorry to hear that, buddy... But then again, regexes can be mystifying. – dda May 30 '12 at 16:51
  • Why isn't there another empty string after the empty string? – trapper May 30 '12 at 17:05
  • 1
    Because it's like the Highlander, There Can Be Only One. – dda May 30 '12 at 17:06
  • 1
    @trapper I think it's the defined behavior of regex, i.e. to provide a single empty string at the end of the subject string (as noted in the quote by @dAm2K). I'm guessing it's there to be used as a hook for some fancy regex'ing – matb33 May 30 '12 at 18:31