php - preg_match string not within the href attribute

Question

i find regex kinda confusing so i got stuck with this problem:

i need to insert <b> tags on certain keywords in a given text. problem is that if the keyword is within the href attribute, it would result to a broken link.

the code goes like this:

$text = preg_replace('/(\b'.$keyword.'\b)/i','<b>\1</b>',$text);

so for cases like

this <a href="keyword.php">keyword</a> here

i end up with:

this <a href="<b>keyword</b>.php"><b>keyword</b></a> here

i tried all sorts of combinations but i still couldn't get the right pattern.

thanks!

score 4 · Answer 1 · answered Sep 17 '10 at 08:40

4

You can't only use Regex to do that. They are powerful, but they can't parse recursive grammar like HTML.

Instead you should properly parse the HTML using a existing HTML parser. you just have to echo the HTML unless you encouter some text entity. In that case, you run your preg_repace on the text before echoing it.

If your HTML is valid XHTML, you can use the xml_parse function. if it's not, then use whatever HTML parser is available.

answered Sep 17 '10 at 08:40

BatchyX

4,986
2
18
17

It is possible with regular expressions (even without using recursive patterns). But it would be a hell of a regular expression with an absolute horrible efficiency. – Gumbo Sep 17 '10 at 09:15
Well, prove it. Make a regex that replace a keyword in a html file only when the keyword is text, and not inside a – BatchyX Sep 17 '10 at 09:41
@Gumbo - Not, it is **NOT POSSIBLE**: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Christian May 18 '11 at 11:40
@Christian Sciberras: [Tom Christiansen, alias tchrist](http://stackoverflow.com/users/471272/tchrist), has several answers to this topic that proof that regular expressions can be used to parse HTML. – Gumbo May 18 '11 at 11:54
@Gumbo - Sure, I can build rockets into cars to make them fly - and they do fly - for a while. – Christian May 18 '11 at 12:00
@BatchyX - Why not? There are native XML and DOM parsers out there. Unlike regexs which work on convoluted strings with some form of pattern, these parsers are specific to their job, nothing less nothing more. – Christian May 18 '11 at 20:14

score 0 · Accepted Answer · answered Sep 17 '10 at 08:49

0

You can use preg_replace again after the first replacement to remove b tags from href:

$text=preg_replace('#(href="[^"]*)<b>([^"]*)</b>#i',"$1$2",$text);

answered Sep 17 '10 at 08:49

mck89

18,918
16
89
106

score 0 · Answer 3 · answered Sep 17 '10 at 09:07

Yes, you can use regex like that, but the code might become a little convulted. Here is a quick example

$string  = '<a href="keyword.php">link text with keyword and stuff</a>';
$keyword = 'keyword';
$text    = preg_replace(
               '/(<a href=")('.$keyword.')(.php">)(.*)(<\/a>)/', 
               "$1$2$3<b>$4</b>$5", 
               $string
           );

echo $string."\n";
echo $text."\n";

The content inside () are stored in variables $1,$2 ... $n, so I don't have to type stuff over again. The match can also be made more generic to match different kinds of url syntax if needed.

Seeing this solution you might want to rethink the way you plan to do matching of keywords in your code. :)

output:

<a href="keyword.php">link text with keyword and stuff</a>
<a href="keyword.php"><b>link text with keyword and stuff</b></a>

php - preg_match string not within the href attribute

3 Answers3