6

I'm using simpile_html_dom for getting html pages elements. I have some div elements like this. All i want is to get "Fine Thanks" sentence in each div (that is not inside any sub-element). How can i do it?

<div class="right">
<h2>
<a href="">Hello</a>
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>
AshKan
  • 779
  • 2
  • 8
  • 22

4 Answers4

2

It should be simply $html->find('div.right > text'), but that won't work because Simple HTML DOM Parser doesn't seem to support direct descendant queries.

So you'd have to find all <div> elements first and search the child nodes for a text node. Unfortunately, the ->childNodes() method is mapped to ->children() and thus only returns elements.

A working solution is to call ->find('text') on each <div> element, after which you filter the results based on the parent node.

foreach ($doc->find('div.right') as $parent) {
    foreach ($parent->find('text') as $node) {
        if ($node->parent() === $parent && strlen($t = trim($node->plaintext))) {
            echo $t, PHP_EOL;
        }
    }
}

Using DOMDocument, this XPath expression will do the same work without the pain:

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//div/text()') as $node) {
    if (strlen($t = trim($node->textContent))) {
        echo $t, PHP_EOL;
    }
}
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • Probably but the OP tells he need to use `simpile_html_dom`. of course XPath has a better solution than we all provided. –  Apr 11 '13 at 08:38
  • 1
    @silentboy Well, that's why my answer has both; I should start an anti-simple_html_dom campaign :) – Ja͢ck Apr 11 '13 at 08:40
  • Don't blame simple, there really is no way to get at that text node (and probably shouldn't be) in css. – pguardiario Apr 11 '13 at 20:32
  • @pguard did you see the xpath expression I've used? That one works just fine. – Ja͢ck Apr 11 '13 at 23:36
1

There is no built in method to read text property in simple_html_dom.php
But this should work;

include 'parser.php';

$html = str_get_html('<div class="right">
<h2>
<a href="">Hello</a>
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>');

function readTextNode($element){
    $local = $element;
    $childs = count($element->childNodes());
    for($i = 0; $i < $childs; $i++)
        $local->childNodes($i)->outertext = '';
    return $local->innertext;
}

echo readTextNode($html->find('div.right',0));
  • That's just awful, no offence; modifying the tree just to extract something is backward and should not be necessary in a proper library. Sigh. – Ja͢ck Apr 11 '13 at 08:33
1

I would switch to phpquery for this one. You still need to use DOM but not too painful:

require('phpQuery.php');

$html =<<<EOF
<div class="right">
<h2>
<a href="">Hello</a>
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>
EOF;

$dom = phpQuery::newDocumentHTML($html);

foreach($dom->find("div.right > *:last") as $last_element){
  echo $last_element->nextSibling->nodeValue;
}

Update These days I'm recommending this simple replacement which does let you avoid the dom ugliness:

$doc = str_get_html($html);
foreach($doc->find('div.right > text:last') as $el){
  echo $el->text;
}
pguardiario
  • 53,827
  • 19
  • 119
  • 159
0
public function removeNode($selector)
{
  foreach ($html->find($selector) as $node)
  {
    $node->outertext = '';
  }

 $this->load($this->save());        
}

use this function to remove the h2 and span element from the div. Then get the div element data.

Reference URL : Simple HTML Dom: How to remove elements?

Community
  • 1
  • 1
Sibiraj PR
  • 1,481
  • 1
  • 10
  • 25