2

I've been picking bits and pieces of code, you can see roughly what I'm trying to do, obviously this doesn't work and is utterly wrong:

<?php

$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container");
$html = $data->saveHTML();
echo $html;

?>

Using a CURL call, I am able to retrieve the document URL source:

function curl_get_file_contents($URL)
{
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);

if ($contents) return $contents;
else return FALSE;
}

$f = curl_get_file_contents('http://example.com/'); 
echo $f;

So how can I use this now to instantiate a DOMDocument object in PHP and extract a node using getElementById

Chris Ballard
  • 3,771
  • 4
  • 28
  • 40
Dan Kanze
  • 18,485
  • 28
  • 81
  • 134

5 Answers5

6

This is the code you will need to avoid any malformed HTML errors:

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("banner");
echo $data->nodeValue."\n"

To dump whole HTML source you can call:

echo $dom->saveHTML();
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    this solution is retrieving content, however I am losing all html tags in the source. – Dan Kanze Jun 06 '12 at 21:13
  • I didn't get it, where are loosing HTML tags? This code just gets a value of an DOM element with the given ID and prints its **text value**. Moreover you can dump whole HTML using `$dom->saveHTML();`. – anubhava Jun 06 '12 at 21:21
  • please modify your source to represent this – Dan Kanze Jun 06 '12 at 21:50
2
<?php

$f = curl_get_file_contents('http://example.com/')

$dom = new DOMDocument();
@$dom->loadHTML($f);
$data = $dom->getElementById("profile_section_container");
$html = $dom->saveHTML($data);
echo $html;

?>

It would help if you provided the example html.

Motomotes
  • 4,111
  • 1
  • 25
  • 24
  • using this code, i recieve the error: DOMDocument::loadHTMLFile(): Unexpected end tag : script in http://example.com. However, I can not disclose URL. – Dan Kanze Jun 06 '12 at 20:33
  • Per your updates: DOMDocument::loadHTML(): Unexpected end tag : script in Entity, line: 19 – Dan Kanze Jun 06 '12 at 20:38
  • loadhtmlfile is not now in my code. I was just pointing out that saveHTML() accepts an optional parameter to limit it to a specific domnode. saveHTML() should be called from the DOMDocument, with the domnode you want to save as the optional parameter. – Motomotes Jun 06 '12 at 20:40
  • if you can't give url, can you post html in pastebin? – Motomotes Jun 06 '12 at 20:41
1

i'm not sure but i remember once i wanted to use this i was unbale to load some external url as file because the php.ini directve allow-url-fopen was set to off ...

So check your pnp.ini or try to open url with fopen to see if you can read the url as a file

<?php
$f = file_get_contents(url);
var_dump($f); // just to see the content
?>

Regards;

mimiz

mimiz
  • 2,178
  • 2
  • 14
  • 14
0

Try this:

$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container")->item(0);
$html = $data->saveHTML();
echo $html;
0

i think that now you can use DOMDocument::loadHTML Maybe you should try Doctype existence (with a regexp) and then add it if necessary, for being sure to have it declare ... Regards

Mimiz

mimiz
  • 2,178
  • 2
  • 14
  • 14