1

We are looking, the script that can easily replace the value using PHP DOM. here we have a HTML code that i need to replace

HTML Code

<html>
<head>
<head>
<body>
<div> Explore HISTORY shows, watch videos and full episodes, play games and access articles on historical topics at History.com <p>Miss an episode of your favorite History shows? Go to history.com to catch up on full episodes and video exclusives.</p></div>
<div>Discover what happened today in history. Read about major past events that happened today including special entries on crime, entertainment, and more.</div>
<p>Experience games from your favorite shows, take quizzes, solve puzzles and more!</p>
</body>
</html>

we have to replace the word 'history'(including bold/small char.) with <u>history</u>

the final code would be

<html>
<head>
<head>
<body>
<div> Explore <u>HISTORY</u> shows, watch videos and full episodes, play games and access articles on historical topics at <u>History</u>.com <p>Miss an episode of your favorite <u>History</u> shows? Go to <u>history</u>.com to catch up on full episodes and video exclusives.</p></div>
<div>Discover what happened today in <u>history</u>. Read about major past events that happened today including special entries on crime, entertainment, and more.</div>
<p>Experience games from your favorite shows, take quizzes, solve puzzles and more!</p>
</body>
</html>

This is what I have tried, but it does not work:

<?php
libxml_use_internal_errors(true);
@$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadHTMLFile('http://www.history.com');
@$body = $doc->getElementsByTagName('body');
       $i=0;
 while(is_object($finance = $doc->getElementsByTagName("body")->item($i)))
             {
                      foreach($finance->childNodes as $nodename)
                      {
                          $node = $doc->createElement("para", "<u>as fasd fasd fadsf</u>");
                          if(stristr($nodename->nodeValue, 'search')){
                          $nodename->appendChild($node);
                          echo $nodename->getAttribute."<br>" ;
                          echo $nodename->nodeValue."<br>" ;
                          @$us = true;
                          }
                        echo $nodename->nodeValue."<br>" ;
                        }

       $i++;
             }
libxml_clear_errors();
hakre
  • 193,403
  • 52
  • 435
  • 836
Sam
  • 53
  • 1
  • 2
  • 9
  • possible duplicate of [Ignore html tags in preg_replace](http://stackoverflow.com/questions/8193327/ignore-html-tags-in-preg-replace) – hakre Oct 01 '12 at 10:04
  • @Sam: Code suggestion is in linked question's answer. You can learn about case-insensitive xpath here: [case insensitive xpath searching in php](http://stackoverflow.com/questions/3238989/case-insensitive-xpath-searching-in-php). – hakre Oct 01 '12 at 10:09
  • One clarification (because someone decided to downvote first and ask questions later - or never at all, dunno): how is the shown data stored? If it's just a string, it's overkill to use DOM parsing here (because, well, you don't PARSE anything here). If it's in DOM structure, I still think you better go with `preg_replace` over its HTML representation, if that's the final task that should be done on it. – raina77ow Oct 01 '12 at 10:10
  • @raina77ow: In case that is, `preg_replace` has been outlined already here: [PHP Regular expression to match keyword outside HTML tag ](http://stackoverflow.com/questions/7798829/php-regular-expression-to-match-keyword-outside-html-tag-a) – hakre Oct 01 '12 at 10:11
  • 1
    @Sam: If you show what you've tried so far, would be helpful, too. Requests for [just code](http://stuck.include-once.org/#help5) are usually off-topic. Primary site intent is coding approaches, not readymade solutions, nor [tutoring](http://stuck.include-once.org/#help6) per se. Even I break that "rule" sometimes :) – hakre Oct 01 '12 at 10:12
  • @hakre Will all respect, I don't understand how this is related. The OP doesn't mention any exceptions here; he doesn't said, for example, that 'history' word shouldn't be changed when it's a link or an attribute or anything else. Why should I invent these conditions for him? ) – raina77ow Oct 01 '12 at 10:13
  • Well, the tricky part is that you match the text "history" (or whatever word) only if it is within textnodes. If you do a regular expression (or just a string search and replace) you can run into the problem to change parts of the document you don't want to, e.g. other nodes, destroying the structure of the document instead of enhancing it. – hakre Oct 01 '12 at 10:17
  • please look at the edited code. Thanks. – Sam Oct 01 '12 at 10:18

1 Answers1

0

Use DOMXPath to find nodes that contain the word "history" case insensitive, then split it into new text nodes.

I went ahead and wrote an implementation of this, out of curiosity. It took me longer than I planned, but it definitely works. I hope it is helpful to you.

<?php

$doc = new DOMDocument();
$doc->preserveWhiteSpace = FALSE;
$doc->resolveExternals = FALSE;
$doc->loadHTML(<<<END
<html>
<head>
</head>
<body>
<div> Explore HISTORY shows, watch videos and full episodes, play games and access articles on historical topics at History.com <p>Miss an episode of your favorite History shows? Go to history.com to catch up on full episodes and video exclusives.</p></div>
<div>Discover what happened today in history. Read about major past events that happened today including special entries on crime, entertainment, and more.</div>
<p>Experience games from your favorite shows, take quizzes, solve puzzles and more!</p>
</body>
</html>
END
);

echo '<p>Input:</p>'."\n";
echo $doc->saveHTML()."\n";

$word    = 'history';
$lcWord  = strtolower($word);
$wordLen = strlen($word);
$xpath   = new DOMXPath($doc);
$nodes   = $xpath->query('/html/body//text()['.
                           'contains('.
                              'translate(.,"'.strtoupper($word).'","'.$lcWord.'"),'.
                              '"'.$lcWord.'")'.
                         ']');
foreach ($nodes as $node)
{
// Split all occurances of "word" into new text nodes.
    $text    = $node->data;
    $textPos = 0;
    while (($wordPos = stripos($text,$word)) !== FALSE)
    {
        $beforeText = substr($text,$textPos,$wordPos - $textPos);
        $wordText   = substr($text,$wordPos,$wordLen);

    // Add the before text to the DOM.
        $node->parentNode->insertBefore($doc->createTextNode($beforeText),$node);

    // Add the word text to the DOM.
    // Underline this word.
        $uNode = $doc->createElement('u');
        $uNode->appendChild($doc->createTextNode($wordText));
        $node->parentNode->insertBefore($uNode,$node);

    // Repeat for the text after the word.
        $text = substr($text,$wordPos + $wordLen);
    }

// Create a text node for text following the word.
    if ($text)
        $node->parentNode->insertBefore($doc->createTextNode($text),$node);

// Remove the original text node.
    $node->parentNode->removeChild($node);
}

echo '<p>Output:</p>'."\n";
echo $doc->saveHTML()."\n";

?>

OUTPUT

<p>Input:</p>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head></head>
<body>
<div> Explore HISTORY shows, watch videos and full episodes, play games and access articles on historical topics at History.com <p>Miss an episode of your favorite History shows? Go to history.com to catch up on full episodes and video exclusives.</p>
</div>
<div>Discover what happened today in history. Read about major past events that happened today including special entries on crime, entertainment, and more.</div>
<p>Experience games from your favorite shows, take quizzes, solve puzzles and more!</p>
</body>
</html>

<p>Output:</p>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head></head>
<body>
<div> Explore <u>HISTORY</u> shows, watch videos and full episodes, play games and access articles on historical topics at <u>History</u>.com <p>Miss an episode of your favorite <u>History</u> shows? Go to <u>history</u>.com to catch up on full episodes and video exclusives.</p>
</div>
<div>Discover what happened today in <u>history</u>. Read about major past events that happened today including special entries on crime, entertainment, and more.</div>
<p>Experience games from your favorite shows, take quizzes, solve puzzles and more!</p>
</body>
</html>
jimp
  • 16,999
  • 3
  • 27
  • 36