1

So I want to replace "strings" for <a>"strings"</a> having an array with strings and another one with URLs. Everything worked like a charm at first, it replaces the first coincidence not withint tags > < ,but when I started populating the array with strings and urls I found out that if it replaces "Fantasy games" -> <a href="ASDFASDF">Fantasy games</a> and then it has to replace "Fantasy" it simply skips the >string< check of the regex and goes ahead and replaces it anyway, breaking the html code and creating parse errors.

So I'm presuming I'm doing something wrong or missing a parameter or something because if the content has >string< it doesn't replace it but if I do it with the preg_replace then is like if I've done it wrongly because it doesn't detect it like >string< when it's going to replace the next element of the array.

Here is the code:

// DB content
// $Keywords=array("Fantasy games", "Fantasy");
// $URL=array("http://www.whatever.com", "http://www.whatever2.com");

$i=0;
// Insert the links and returns the processed content.
foreach ($SQLResult as $row){
    $Keywords[$i]="/[^>](".$row->Keyword.")[^<]/i";
    $URLS[$i]=' <a href="'.$row->URL.'">$1</a> ';
    $i++;
}   
$Content=preg_replace($Keywords, $URLS, $Content, 1);
HamZa
  • 14,671
  • 11
  • 54
  • 75
Pablo
  • 31
  • 3
  • Should the regex be `"/[^>]*>(Fantasy games)]" means match an character which is not '>'. – TroyCheng Jul 22 '13 at 10:22
  • TroyCheng that's how it's meant to be, it's looking for strings that are not in a link or any other html tag already, which works, but not with the strings replaced by preg_replace itself, weird. – Pablo Jul 22 '13 at 10:52
  • Jens just checked the link and tried those without DOM (since those not an option in this case) and got the same behavior. works ok, but not when it has to check the pre-replaced text, it fails the "not within > <" check and breaks the code. – Pablo Jul 22 '13 at 11:03

1 Answers1

0

I have started from the code found on this question, as pointed by @Jens : https://stackoverflow.com/posts/4209925/edit

<?php

$dom = new DOMDocument();
// loadXml needs properly formatted documents, so it's better to use loadHtml, but it needs a hack to properly handle UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8"));

$xpath = new DOMXPath($dom);

foreach($xpath->query('//text()[not(ancestor::a)]') as $node)
{
    $i=0; 
    // Insert the links and returns the processed content.
    foreach ($SQLResult as $row){
        $replaced = str_ireplace($row->Keyword, '<a href="'.$row->URL.'">$0</a>', $node->wholeText);
        $newNode  = $dom->createDocumentFragment();
        $newNode->appendXML($replaced);
        $node->parentNode->replaceChild($newNode, $node);
    }
}

// get only the body tag with its contents, then trim the body tag itself to get only the original content
echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
Community
  • 1
  • 1
edi9999
  • 19,701
  • 13
  • 88
  • 127