5

I am trying to use PHP Simple HTML DOM Parser to grab the HTML of an external file. The file contains a table and the goal is to find a able cell with specific data contents, and then get the next sibling cell's data. This data needs to be places into a PHP variable.

Based on the research and info found in articles like How to parse and process HTML/XML with PHP?, Grabbing the href attribute of an A element, Scraping Data: PHP Simple HTML DOM Parser and of course PHP Simple HTML DOM Parser Manual I've been able to produce some results, but I'm afraid I may be on the wrong track.

The table row looks like this:

<tr>
<td>fluff</td>  
<td>irrelevant</td> 
<td>etc</td>   
<td><a href="one">Hello world</a></td>                        
<td>123.456</td> 
<td>fluff</td>          
<td>irrelevant</td>   
<td>etc</td>
</tr>

What I'm trying to accomplish is to find the table cell that contains "Hello world", and then get the number from withing the next td cell. The following code finds that table cell and echoes its contents, but my attempts to use it as a landmark in order to get the next cell's data have failed...

$html = file_get_html("http://site.com/stuff.htm");
$e = $html->find('td',0)->innertext = 'Hello world';
echo $e;

So ultimately, in the example above the value of 123.456 needs to somehow get into a PHP variable.

Thanks for your help!

Community
  • 1
  • 1
stotrami
  • 159
  • 1
  • 1
  • 11
  • find your element, then use next_sibling() to gets its "neighbor" – Marc B Apr 02 '13 at 18:14
  • @Marc: I'm able to find the inner text of the element: `$e = $html->find('td',0)->innertext = 'Hello world';` but I'm not sure how to reference the element itself once it's found. – stotrami Apr 02 '13 at 19:58
  • the `find(td,0)` returns that element, which you then immediately extract the innertext from. if it was `find(td,0)->next_sibling()`, you'd get the td after the one you found. – Marc B Apr 02 '13 at 21:40

2 Answers2

4

It can be done using the DOMXPath class. You won't need an external library for this.

Here comes an example:

<?php

$html = <<<EOF
<tr>
<td>fluff</td>  
<td>irrelevant</td> 
<td>etc</td>   
<td><a href="one">Hello world</a></td>                        
<td>123.456</td> 
<td>fluff</td>          
<td>irrelevant</td>   
<td>etc</td>
</tr>
EOF;


// create empty document 
$document = new DOMDocument();

// load html
$document->loadHTML($html);

// create xpath selector
$selector = new DOMXPath($document);

// selects the parent node of <a> nodes
// which's content is 'Hello world'
$results = $selector->query('//td/a[text()="Hello world"]/..');

// output the results 
foreach($results as $node) {
    echo $node->nodeValue . PHP_EOL;
}
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • This example does indeed allow me to find the inner text of the element. How would I then get the next sibling's inner text? – stotrami Apr 02 '13 at 20:00
  • Use `$node->nextSibling->nodeValue;` for that. Note that DOMXpath is **much** faster then the PHP Simple Html DOM solution – hek2mgl Apr 02 '13 at 20:05
  • Awesome. That works perfectly as well, and without needing to include the PHP Simple DOM Parser. Thank you heh2mgl! – stotrami Apr 02 '13 at 20:12
2

using simple html dom parser:

$str = "<table><tr>
<td>fluff</td>  
<td>irrelevant</td> 
<td>etc</td>   
<td><a href=\"one\">Hello world</a></td>                        
<td>123.456</td> 
<td>fluff</td>          
<td>irrelevant</td>   
<td>etc</td>
</tr></table>";

$html = str_get_html($str);

 $tds = $html->find('table',0)->find('td');
 $num = null;
 foreach($tds as $td){

     if($td->plaintext == 'Hello world'){

        $next_td = $td->next_sibling();
        $num = $next_td->plaintext ;    
        break; 
     }
 }

 echo($num);
Adidi
  • 5,097
  • 4
  • 23
  • 30
  • Thank you Adidi. This works and returns the results that are required. If I understand, it iterates through each table, and each td within each table, and when it finds a td with the desired value it then grabs the next sibling's plaintext value. Is this the case, meaning will this search within all tables or just the first table within the DOM? – stotrami Apr 02 '13 at 20:04
  • You can decide that - this code go through the first table: `$html->find('table',0)` the zero index means the first table tag it finds – Adidi Apr 02 '13 at 20:24