1

How can I replace all the occurrences of the $keyword within a string without replacing the keywords found within the hyperlink URLs, image tag URLs, image tag title and alt tags?

Example:

$keywords = 'sports';

$string = '<a href="http://my_domain_name.com/sports/info.php"><img class="icon" src="http://my_domain_name.com/sports/images/football.gif" title="Get the latest football sports news" alt="Get the latest football sports news" />Football sports news</a>';

Notice that the keyword 'sports' appears with the hyperlink URL, images tag URL, and image tag title and alt tags.

I want to replace $keywords (sports) with:

<span style="color: #000000; background-color: #FFFF00; font-weight: normal;">sports</span>

to yeild the following results:

<a href="http://my_domain_name.com/sports/info.php"><img class="icon" src="http://my_domain_name.com/sports/images/football.gif" title="Get the latest football sports news" alt="Get the latest football sports news" />Football <span style="color: #000000; background-color: #FFFF00; font-weight: normal;">sports</span> news</a>

Thanks in advance.

EDIT - Additional Information

Currently I am using the following 2-step method and it works for just the URLs, and not the title and alt tags. I also need to not replace the keywords in the title and alt tags too.

// Replaces both the website and general images path urls with character strings (used to prevent highlighting keywords found within the path urls)
   if(strpos('http://my_domain_name.com/sports', $keywords) != false) {
     $description = str_ireplace('http://my_domain_name.com/sports', '1q2w3e4r5t6y7u', $description);
   }
   if(strpos('http://my_domain_name.com/sports/images', $keywords) != false) {
     $description = str_ireplace('http://my_domain_name.com/sports/images', '7u6y5t4r3e2w1q', $description);
   }

// Highlights the Search Keywords
   $description = str_ireplace($keywords, '<span style="color: #000000; background-color: #FFFF00; font-weight: normal;">'.$keywords.'</span>', $description);

// Replaces the character strings with the website and general images path urls
   if(strpos('http://my_domain_name.com/sports', $keywords) != false) {
     $description = str_ireplace('1q2w3e4r5t6y7u', 'http://my_domain_name.com/sports', $description);
   }
   if(strpos('http://my_domain_name.com/sports/images', $keywords) != false) {
     $description = str_ireplace('7u6y5t4r3e2w1q', 'http://my_domain_name.com/sports/images', $description);
   }
Sammy
  • 877
  • 1
  • 10
  • 23
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – CodeCaster Jun 12 '12 at 22:36
  • @CodeCaster Did I miss something? Where is the regex in the question? – PeeHaa Jun 12 '12 at 22:50
  • @PeeHaa then you don't see my point. A HTML/DOM parser is required here, there is no alternative solution. You cannot replace HTML with mere string replacement functions. – CodeCaster Jun 12 '12 at 22:56
  • 1
    @CodeCaster yeah..but.. why didn't you say that? Wasn't exactly constructive. Although that top answer is a work of genius. – John David Ravenscroft Jun 12 '12 at 22:59

3 Answers3

2

This was the best I could do using PHP's DOMDocument.

$str = '<a href="http://my_domain_name.com/sports/info.php"><img class="icon" src="http://my_domain_name.com/sports/images/football.gif" title="Get the latest football sports news" alt="Get the latest football sports news" />Football sports news</a>';

$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();
$fragment->appendXML( $str);
$doc->appendChild( $fragment);

// Create the <span>
$node = $doc->createElement( 'span');
$node->setAttribute( 'style', 'color: #000000; background-color: #FFFF00; font-weight: normal;');
$node->nodeValue = 'sports';

foreach( $doc->getElementsByTagName( 'a') as $tag)
{
    $img_tag = $tag->firstChild->cloneNode();
    $text = $doc->createTextNode( $tag->textContent);
    $tag->nodeValue = ''; // Clear out the contents of the <a>

    // Get the text before and after the replacement
    $start = strpos( $text->wholeText, 'sports');
    $before = $text->substringData( 0, $start);
    $after = $text->substringData( $start + strlen( 'sports'), strlen( $text->wholeText));

    // Put the image tag back, along with the before text, the <span>, and the after text
    $tag->appendChild( $img_tag);
    $tag->appendChild( $doc->createTextNode( $before));
    $tag->appendChild( $node);
    $tag->appendChild( $doc->createTextNode( $after));
}
echo htmlentities( $doc->saveHTML()) . "\n";

This outputs:

<a href="http://my_domain_name.com/sports/info.php">
    <img class="icon" src="http://my_domain_name.com/sports/images/football.gif" title="Get the latest football sports news" alt="Get the latest football sports news">Football <span style="color: #000000; background-color: #FFFF00; font-weight: normal;">sports</span> news
</a> 

Demo

(You need PHP > 5.3)

nickb
  • 59,313
  • 13
  • 108
  • 143
  • fancy. but I like mine better. – John David Ravenscroft Jun 13 '12 at 00:02
  • Whelp, i'm just joshing. but the guy below is using regular expressions. Refocus your misgivings at him! **cough**(mine uses a lot less overhead and is really simple.. I mean.. there are two really easy hooks to grab in there: "<" and ">") – John David Ravenscroft Jun 13 '12 at 00:16
  • Not really... Besides being cryptic, yours is extremely inflexible. What if the HTML changed tomorrow? This is flexible and extensible, and can be enhanced to be generic, I'm just not putting the effort in to do so, as this is a MWE for the OP's problem. Besides, comments aren't really a place to promote your own answer, especially since your answer is pleading for rep... – nickb Jun 13 '12 at 00:26
0

xml_parse can be used to strip away the tags in HTML code. http://www.w3schools.com/php/func_xml_parse.asp is a good tutorial on how to use it.

I would strip all the html tags away from my string, and then use:
str_replace($keyword, $replace_string, $string);

to do the rest.

http://www.php.net/manual/en/function.str-replace.php


$replace_string = "<span fancy colours>{$keywords}</span>";
$string = '<a href="http://my_domain_name.com/sports/info.php"><img class="icon"     src="http://my_domain_name.com/sports/images/football.gif" title="Get the latest football     sports news" alt="Get the latest football sports news" />Football sports news</a>';

$exploded = explode("<", $string);

$tmp_array = array();
foreach ($exploded as $abit) {
    $pos = (strpos($abit, ">") + 1);            //get end of tag
    $tmp_string = substr($abit, $pos);
    if (strlen($tmp_string) > 1) {  // has text outside of tags
        $tmp_string = str_ireplace($keywords, $replace_string, $tmp_string);
        $tmp_array[] = substr($abit,0,$pos) . $tmp_string;
    } else {
        $tmp_array[] = $abit;
    }
}

$newstring = implode("<", $tmp_array);
echo $newstring;

Can has rep now?

  • I guess @Sammy wants the HTML to be usable after highlighting his search words (or whatever), so removing all present tags might not be an option. Furthermore I would not recommend an XML parser on the HTML document. – CodeCaster Jun 12 '12 at 22:56
  • True! Didn't read properly :\ sorry. XML parser will do the job I described just fine. – John David Ravenscroft Jun 12 '12 at 23:00
  • I edited my posting to reflect what I am doing currently, but need a better solution. – Sammy Jun 12 '12 at 23:18
0

Just working with strings I would just do the following since all the attribute values always come before the element value it's easy to get the right match, then just use a callback to replace 'sports' with whatever you like.

probably more what you need:

function replacer($match)
{
    global $replace_match_with_this, $string_to_replace;
    return str_ireplace($string_to_replace, $replace_match_with_this, $match[0]);
}

$new_string = preg_replace_callback(sprintf('/>[^<>]*[\s-]+%s[\s-]+[^<>]*<\/a>/i', $keyword), 'replacer', $string, 1);

presumably $keyword and $string_to_replace hold the same value and can be combined into one variable.

user1433150
  • 229
  • 3
  • 8
  • Since sports is a variable ($keyword), would I replace the word sports in the preg_replace_callback () with $keyword as in: $new = preg_replace_callback('/>[\w]*\s+$keyword\s+[\w]*<\/a>/i', 'replacer', $string, 1); – Sammy Jun 13 '12 at 00:13
  • ah...I would use `$new = preg_replace_callback(spritf('/>[\w]*\s+%s\s+[\w]*<\/a>/i', $keyword), 'replacer', $string, 1);` – user1433150 Jun 13 '12 at 00:30
  • also if you want pass $replacement and $match to replacer function from somewhere else you should make them global within the funciton. – user1433150 Jun 13 '12 at 00:37
  • for some reason the preg_replace_callback() function is not returning anything. The globals are set correctly and I placed an echo command in the function for testing but nothing is echoed. @user1433150 – Sammy Jun 13 '12 at 01:19
  • hmm...My best guess would be that you're not supplying the callback correctly. Use just the function name, no parens `'replacer'`. I just tested and it works: php 5.3 – user1433150 Jun 13 '12 at 02:09
  • This is what I did: function replacer($str){ global $match; global $replacement; global $keywords; $replacement = ''.$keywords.''; $match = $keywords; return str_ireplace($match, $replacement, $str[0]); } $description = preg_replace_callback(sprintf('/>[\w]*\s+%s\s+[\w]*<\/a>/i', $keywords), 'replacer', $description, 1); and it did not work – Sammy Jun 13 '12 at 03:13
  • Well I can't debug your script remotely. I can tell you that I just copied and pasted your code into a test.php file added `$keywords = 'sports'; $description = //example you gave earlier;` and `echo $description` and ran it in chrome. It works. do `echo gettype($description)`, confirm that the function is running, add some error handling to your script `ini_set('display_errors', 1); error_reporting(E_ALL);` – user1433150 Jun 13 '12 at 03:46
  • The code I gave will return the string you want, depending on what you want to do with it you may want to create an instance of DOMDocument to add it to an xml/html document. – user1433150 Jun 13 '12 at 03:51
  • also there's no need to make $match global if you're setting its value to $keywords. As per the example you gave there's also no need for a $match variable you can just substitute $keywords. Sorry if I confused you by using different variables. – user1433150 Jun 13 '12 at 03:56
  • `preg_replace_callback` returns NULL on error. If you get a NULL type and the `replacer` function is running do a `var_dump($str)` confirm a match is being produced, and argument passed. If it's not running call it yourself and see what happens, change the function name, check your version info phpinfo(). If it's not working it has a good reason ;) Best of luck. – user1433150 Jun 13 '12 at 04:09
  • I uploaded a test.txt file at http://nnexxtt.com/test.txt showing the code that did not work. I tried different variationss, but could not get it to work properly. @user1433150 – Sammy Jun 13 '12 at 06:02
  • Ok, I see the problem. change the regex to `/>[\w\s]*\s+sports\s+[\w\s]* – user1433150 Jun 13 '12 at 06:27
  • @Sammy Actually this is much more to the point, and I edited the regex in my code to this: `/>[^<>]*[\s-]+%s[\s-]+[^<>]*<\/a>/i`. I do want to go on record to say that the answer @nickb gave is really the correct way to do this but if you don't know anything about classes or OOP then using DOMDocument can be a bit daunting. – user1433150 Jun 13 '12 at 07:18