0

I'm having a really weird problem with preg_replace here (and as far as I can remember, this isn't the first time I've seen this). I have an XML with an element with invalid structure (closing tag is missing the slash, breaks parser):

<info> 
<datetime>2013.04.12 12:04:02</datetime> 
<info> 

What I'm trying to do is this: $xml = preg_replace('/<info>.*<info>/iu', '', $xml) (because I don't actually need that element), but IT DOES NOT REPLACE.
How do I make it work?

jurchiks
  • 1,354
  • 4
  • 25
  • 55

4 Answers4

4

It doesn't replace becase there aren't matches:

<?php

$xml = '<info>
    <datetime>2013.04.12 12:04:02</datetime>
<info>';
var_dump(preg_match('/<info>.*<info>/iu', $xml, $matches), $matches);
int(0)
array(0) {
}

Let's see what's wrong. What does . mean exactly?

match any character except newline (by default)

So there it is! How do you change the default? We have a look at the available internal options and find this:

s for PCRE_DOTALL

.... where PCRE_DOTALL means:

s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded.

We can change it locally:

'/<info>(?s:.*)<info>/iu'
          ^

... or globally:

'/<info>.*<info>/ius'
                   ^
Álvaro González
  • 142,137
  • 41
  • 261
  • 360
3

Try adding the s modifier to the regex rule. Will not stop matching at new line

Ozzy
  • 10,285
  • 26
  • 94
  • 138
3

Add the s modifier and use ? to make it non-greedy:

$string = '<info> 
<datetime>2013.04.12 12:04:02</datetime> 
<info>
<valid>2013.04.12 12:04:02</valid>
<info> 
<datetime>2013.04.12 12:04:02</datetime> 
<info>';
var_dump(preg_replace('/<info>.*?<info>/s', '', $string));
HamZa
  • 14,671
  • 11
  • 54
  • 75
  • 1
    Ok, adding the ? worked. Thanks! Might I ask why did it work? – jurchiks Apr 12 '13 at 11:11
  • @jurchiks so first you are commenting that do not read your question entirely... and now you are asking why it worked **even though it is already explained in the answer?** `.*?` is a way to make the match non-greedy. – Tim S. Apr 12 '13 at 11:24
  • It isn't EXPLAINED in the answer, it just says "make it non-greedy". That doesn't mean anything to me, I'm not a regex guru. Check Álvaro G. Vicario's answer for an example of a very good explanatory answer. Sure, I got the fix I was looking for, but a good explanation on why it works would be good not just for me but for everyone else reading the replies later on. – jurchiks Apr 12 '13 at 11:25
  • 2
    @jurchiks You edited your comment waaaaaaaa ! Well Álvaro G. Vicario just explained what the s modifier does, which I think isn't even needed since it's all documented in the PHP docs, and honestly there are several articles on the net about **greedy vs non-greedy**. – HamZa Apr 12 '13 at 11:32
  • 1
    `.*` would match anything. `.*a` *should* match everything that ends with an `a`. However, because `.*` is greedy, it will match that `a` too, which leaves no more `a`'s to match at the end, making it fail. By making it non-greedy, this is prevented. – Tim S. Apr 12 '13 at 11:33
  • In your case, `.*` would match `` too. The last part of your regular expression is trying to find an ending ``, but it's already consumed by our greedy villain `.*`. – Tim S. Apr 12 '13 at 11:35
  • @HamZa - I *added* to my comment, didn't edit the existing text. A thorough explanation right in the answer is much better than sending the person around to "google it" or "read this wall of text over here". @ Tim - thanks for the explanation. – jurchiks Apr 12 '13 at 11:38
  • @jurchiks I agree with you, thanks anyways for the reps :) – HamZa Apr 12 '13 at 11:47
1

See http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

You need to use the s modifier at the end of your regex.

$xml = preg_replace('/<info>.*<info>/ius', '', $xml);
bwoebi
  • 23,637
  • 5
  • 58
  • 79