Get Content of Remote HTML page

Question

I'm using the example of this post How to get content from another page but I need to get just "SUPERMAN" from website with this format:

<td headers="superHero">SUPERMAN</td>
<td headers="country">USA</td>

the code:

$url = "http://www.otherweb.com";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$output = curl_exec($curl);
curl_close($curl);


$DOM = new DOMDocument;
$DOM->loadHTML( $output);

//get all td
//$items = $DOM->getElementsByTagName('td'); 
$items = $DOM->getElementsByID('superHero');

//display all text
 for ($i = 0; $i < $items->length; $i++)
 echo $items->item($i)->nodeValue . "<br/>";

Thanks!!!

Start from [here](http://www.php.net/manual/en/class.domdocument.php) — hindmost, Jun 23 '14 at 14:52
In addition to the comment above, `getElementById()` matches DOM elements' `id` values, not these `headers` attributes. — esqew, Jun 23 '14 at 14:53
a DOM element has to be **UNIQUE** in the entire document. `getElement **S* by Id()` is therefore redundant - there can never be more than one element with a particular ID, so there is no 's' version of the function. it is just `getElementById()`, singular. — Marc B, Jun 23 '14 at 14:54

hek2mgl · Accepted Answer · 2014-06-23T14:58:04.243

First, you can skip the curl part. DOMDocument has the method loadHTMLFile() to load even remote html files. Just use:

$DOM = new DOMDocument();
$DOM->loadHTMLFile($url);
// If the remote page might not being valid against HTML standards,
// you might want to use the "silence operator" : @
@$DOM->loadHTMLFile($url);

If you want to select an element by it's attribute value, you use XPath:

$selector = new DOMXPath($DOM);
$element = $selector->query('//td[@headers="superHero"]')->item(0);

Get Content of Remote HTML page

1 Answers1