0

I want the regex to use in preg_replace() to replace text only inside a tag (between "<" and ">"), without affect the text out of angle brackets limit. Like this example:

$html = '<div class="REPLACE_ME" id="my_id">this REPLACE_ME cannot be replaced</div>';
$html = preg_replace('/\bREPLACE_ME\b/', 'REPLACED', $html);

then, the result expected in $html variable must be like this:

<div class="REPLACED" id="my_id">this REPLACE_ME cannot be replaced</div>

The regex cannot be around the quotes, because I have other variants like:

<REPLACE_ME>this REPLACE_ME cannot be replaced</REPLACE_ME>
<div REPLACE_ME="my_attribute">this REPLACE_ME cannot be replaced</div>
rcsalvador
  • 81
  • 8
  • 6
    Already answered here: http://stackoverflow.com/a/1732454/597122 – rockerest Aug 05 '14 at 17:59
  • 2
    possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – rockerest Aug 05 '14 at 18:00
  • Thank for your tip @rocketest, but this reference don't help-me, because I don't want to find only matches tags. – rcsalvador Aug 05 '14 at 18:25
  • 1
    Hey Rodrigo, it was a tongue-in-cheek posting, but the underlying point is this: it doesn't SPECIFICALLY answer your question, but the overall answer is ---> Do NOT parse HTML with REGEX. You simply cannot do it without creating black holes. It's a bad idea. – rockerest Aug 05 '14 at 18:30

2 Answers2

1

Regex:

<[^>]*\KREPLACE_ME(?=[^>]*?>)

Replacement string:

REPLACED

DEMO

PHP code would be,

<?php
$mystring = <<<'EOT'
<div class="REPLACE_ME" id="my_id">this REPLACE_ME cannot be replaced</div>
<REPLACE_ME>this REPLACE_ME cannot be replaced</REPLACE_ME>
<div REPLACE_ME="my_attribute">this REPLACE_ME cannot be replaced</div>
EOT;
echo preg_replace('~<[^>]*\KREPLACE_ME(?=[^>]*?>)~', 'REPLACED', $mystring);
?>

Output:

<div class="REPLACED" id="my_id">this REPLACE_ME cannot be replaced</div>
<REPLACED>this REPLACE_ME cannot be replaced</REPLACED>
<div REPLACED="my_attribute">this REPLACE_ME cannot be replaced</div>

Explanation:

  • < Matches the lesser than < symbol.
  • [^>]* Matches any character not of > zero or more times.
  • \K Discards the previously matched characters. So from < upto the REPLACE_ME string would be discarded.
  • REPLACE_ME Matches the string REPLACE_ME.
  • (?=[^>]*?>) Lookahead asserts that the following characters must be anything not of > symbol followed by >. This ensures that the matched string REPLACE_ME is within <> block.
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
-2

Perhaps something like this

I think what you're looking for is a positive lookahead or lookbehind.

So the regex I used is:

(?<=<).*?(REPLACE_ME).*?(?=>)

The (?<=<) means make sure there is a < to the left and
then (?=>) means make sure there is a > to the right

Also, consider using an XML parser. Regex is very limited when it comes to tags like this.

skamazin
  • 757
  • 5
  • 12
  • [Take a look here](http://regex101.com/r/tJ1yD2/3) I'm fairly certain it matches only the `REPLACE_ME`s that are within a tag. Please give me an example where the regex fails. – skamazin Aug 06 '14 at 11:39