Retrieving values of nested classes using DomDocument and DomXPath

Question

I am trying to fetch contents from an open source yellow page, that lists company entries in this format.

Foo Bar LLC ZIP : 40922
TEL : (281) 087 765 09 130
FAX : (281) 087 765 09 130
foo_bar@yahoo.com

Below, you can see the HTML structure of the above content.

<div class="entry">
    <div class="company">Foo Bar LLC</div>
        <div class="contents">
            <div class="adress">
                ZIP : 40922 <br>
                TEL : (281) 087 765 09 130<br>
                FAX : (281) 087 765 09 130<br> 
                <a href="mailto:foo_bar@yahoo.com"> foo_bar@yahoo.com</a><br> 
            </div>
        </div> 
    </div>
</div>

Basically, in one page there are dozens of entries like that, and my aim is to iterate through the page, and output the results. Which I can. But, I can only get what is inside the entry class, so my output looks like:

Foo Bar LLC ZIP :40922 TEL :(281) 087 765 09 130FAX :(281) 087 765 09 130 foo_bar@yahoo.com

But as you can see, not only my code (see below) returns everything in one line, but also leaves no spaces between the contents so, it is hard to make use of the returned data. So, I am just looking for a solution that can help me return the company, zip, tel, fax and mail separately.

Code I used.

<?php
ini_set('max_execution_time', 300);  
$dom = new DomDocument; 
$html = $dom->loadHTMLFile('http://foo.bar'); 
$finder = new DomXPath($dom);
$classname="entry";
$nodes = $finder->query("//*[contains(@class, '$classname')]");
foreach ($nodes as $key => $value) {
echo $value->nodeValue  . "<br/>";
}

you're fetching `nodeValue`, which is roughly equivalent to `innerText`. you'd have to fetch the innerHTML to preserve the `
` tags in there. see: http://stackoverflow.com/questions/2087103/innerhtml-in-phps-domdocument — Marc B, Feb 06 '15 at 18:55

Retrieving values of nested classes using DomDocument and DomXPath

Code I used.

0 Answers0