making preg_replace not working inside html

Question

I am struggling.. I use this simple code for searching words in text and add relevant texts:

$search=array("/\bword1\b/","/\bword2\b/","/\bword3\b/");
$replace=array("<a href='link1'>word1</a>",ecc);
preg_replace($search,$replace,$myText);

Problem comes when one of the search pattern is found between a html inside $myText. Example:

$myText="blablablabla <strong class="word1">sad</strong>";

As you can see word1 is a css class for the link. If i run the preg_replace will destroy every markup there.

How can I edit my $search pattern for not matching inside html, something like: [^<.?*>] ?

Thanks

Here's a very popular and relevant answer to your question http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Damp, Feb 18 '11 at 19:16
@Damp: It's as popular as it is overgeneralized and wrong due to being cited in the most irrelevant contexts. — mario, Feb 18 '11 at 19:21

score 1 · Accepted Answer · answered Feb 18 '11 at 19:35

1

The simple-minded workaround is:

preg_replace("# [>][^<]*? \K \b(word1)\b #x", $replace, $txt);

This ensures that there's one closing > angle bracket before the word. (And \K makes it forget that matched part). However it will only ever replace the very first occurence of word1 per enclosing tag / paragraph / etc.

So a much better solution would be to use preg_replace_callback("/>([^<]+)/") and the second word1|word2|word3 regex (your existing code) in the callback function instead.

answered Feb 18 '11 at 19:35

mario

144,265
20
237
291

why those space between *? \k \b ? – dynamic Feb 18 '11 at 19:39
1

@yes123: The `#x` flag at the end allows for more readability/spaces. It fails in your example because the html text was not enclosed in a tag. Change the regex start into `# (^|>)[^<]*`... – mario Feb 18 '11 at 19:44
Btw, codepad fails for me too: http://codepad.org/zd9ZFcRU - try on a normal PHP setup. – mario Feb 18 '11 at 19:48
@mario: again lol. This is excellent! this avoids me to use DOM parser!! thanks! I would give you more reputation if i could – dynamic Feb 18 '11 at 20:15
@yes123: But take care: As I said this simple example will find **only the very first occurrence** of each word per enclosing tag. Obviously this code is more readable and maintainable than a DOM solution, but you could look into phpQuery or QueryPath sometime - should there be a need (it's DOM manipulation with a super-simple jQuery-like API, can recommend that). – mario Feb 18 '11 at 20:19
1

See also the examples on http://stackoverflow.com/questions/3650125/how-to-parse-html-with-php (more realistic than the funny link "you can't parse html ...") – mario Feb 18 '11 at 20:27
@mario: I only need 1 link for occurece :D, infact i use preg_replace with limit 1 :D. If it's the very first it's ok. And thanks for that link, it's awesome (phpQuery :D) – dynamic Feb 18 '11 at 20:28

making preg_replace not working inside html

1 Answers1