getting element content with simpe-html-dom

Question

I'm using simpile_html_dom for getting html pages elements. I have some div elements like this. All i want is to get "Fine Thanks" sentence in each div (that is not inside any sub-element). How can i do it?

<div class="right">
<h2>
<a href="">Hello</a>
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>

Can you post 2 more div elements? or whether that repeats like the posted div element? — Jenson M John, Apr 11 '13 at 06:45
@Jenson M Jhon: They have the same structure, but different contents — AshKan, Apr 11 '13 at 06:52

Ja͢ck · Answer 1 · 2013-04-11T08:25:58.420

2

It should be simply $html->find('div.right > text'), but that won't work because Simple HTML DOM Parser doesn't seem to support direct descendant queries.

So you'd have to find all <div> elements first and search the child nodes for a text node. Unfortunately, the ->childNodes() method is mapped to ->children() and thus only returns elements.

A working solution is to call ->find('text') on each <div> element, after which you filter the results based on the parent node.

foreach ($doc->find('div.right') as $parent) {
    foreach ($parent->find('text') as $node) {
        if ($node->parent() === $parent && strlen($t = trim($node->plaintext))) {
            echo $t, PHP_EOL;
        }
    }
}

Using DOMDocument, this XPath expression will do the same work without the pain:

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//div/text()') as $node) {
    if (strlen($t = trim($node->textContent))) {
        echo $t, PHP_EOL;
    }
}

edited Apr 11 '13 at 08:25

answered Apr 11 '13 at 07:34

Ja͢ck

170,779
38
263
309

Probably but the OP tells he need to use `simpile_html_dom`. of course XPath has a better solution than we all provided. – Apr 11 '13 at 08:38
1

@silentboy Well, that's why my answer has both; I should start an anti-simple_html_dom campaign :) – Ja͢ck Apr 11 '13 at 08:40
Don't blame simple, there really is no way to get at that text node (and probably shouldn't be) in css. – pguardiario Apr 11 '13 at 20:32
@pguard did you see the xpath expression I've used? That one works just fine. – Ja͢ck Apr 11 '13 at 23:36

score 1 · Accepted Answer · answered Apr 11 '13 at 07:21

There is no built in method to read text property in simple_html_dom.php
But this should work;

include 'parser.php';

$html = str_get_html('<div class="right">
<h2>
<a href="">Hello</a>
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>');

function readTextNode($element){
    $local = $element;
    $childs = count($element->childNodes());
    for($i = 0; $i < $childs; $i++)
        $local->childNodes($i)->outertext = '';
    return $local->innertext;
}

echo readTextNode($html->find('div.right',0));

That's just awful, no offence; modifying the tree just to extract something is backward and should not be necessary in a proper library. Sigh. — Ja͢ck, Apr 11 '13 at 08:33

pguardiario · Answer 3 · 2014-02-24T01:52:24.850

I would switch to phpquery for this one. You still need to use DOM but not too painful:

require('phpQuery.php');

$html =<<<EOF
<div class="right">
<h2>
<a href="">Hello</a>
</h2>
<br/>
<span>How Are You?</span>
<span>How Are You?</span>
<span>How Are You?</span>
Fine Thanks
</div>
EOF;

$dom = phpQuery::newDocumentHTML($html);

foreach($dom->find("div.right > *:last") as $last_element){
  echo $last_element->nextSibling->nodeValue;
}

Update These days I'm recommending this simple replacement which does let you avoid the dom ugliness:

$doc = str_get_html($html);
foreach($doc->find('div.right > text:last') as $el){
  echo $el->text;
}

score 0 · Answer 4 · edited May 23 '17 at 12:04

0

public function removeNode($selector)
{
  foreach ($html->find($selector) as $node)
  {
    $node->outertext = '';
  }

 $this->load($this->save());        
}

use this function to remove the h2 and span element from the div. Then get the div element data.

Reference URL : Simple HTML Dom: How to remove elements?

edited May 23 '17 at 12:04

Community

1
1

answered Apr 11 '13 at 06:47

Sibiraj PR

1,481
1
10
25

getting element content with simpe-html-dom

4 Answers4