PHP retrieve inner HTML as string from URL using DOMDocument

Question

I've been picking bits and pieces of code, you can see roughly what I'm trying to do, obviously this doesn't work and is utterly wrong:

<?php

$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container");
$html = $data->saveHTML();
echo $html;

?>

Using a CURL call, I am able to retrieve the document URL source:

function curl_get_file_contents($URL)
{
$c = curl_init();
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_URL, $URL);
$contents = curl_exec($c);
curl_close($c);

if ($contents) return $contents;
else return FALSE;
}

$f = curl_get_file_contents('http://example.com/'); 
echo $f;

So how can I use this now to instantiate a DOMDocument object in PHP and extract a node using getElementById

Please note that if your HTML does not contain a doctype declaration, then getElementById will always return null. — Motomotes, Jun 06 '12 at 20:17

anubhava · Accepted Answer · 2013-10-14T05:29:24.977

6

This is the code you will need to avoid any malformed HTML errors:

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("banner");
echo $data->nodeValue."\n"

To dump whole HTML source you can call:

echo $dom->saveHTML();

edited Oct 14 '13 at 05:29

answered Jun 06 '12 at 21:04

anubhava

761,203
64
569
643

1

this solution is retrieving content, however I am losing all html tags in the source. – Dan Kanze Jun 06 '12 at 21:13
I didn't get it, where are loosing HTML tags? This code just gets a value of an DOM element with the given ID and prints its **text value**. Moreover you can dump whole HTML using `$dom->saveHTML();`. – anubhava Jun 06 '12 at 21:21
please modify your source to represent this – Dan Kanze Jun 06 '12 at 21:50

Motomotes · Answer 2 · 2012-06-06T20:46:16.173

2

<?php

$f = curl_get_file_contents('http://example.com/')

$dom = new DOMDocument();
@$dom->loadHTML($f);
$data = $dom->getElementById("profile_section_container");
$html = $dom->saveHTML($data);
echo $html;

?>

It would help if you provided the example html.

edited Jun 06 '12 at 20:46

answered Jun 06 '12 at 20:31

Motomotes

4,111
1
25
24

using this code, i recieve the error: DOMDocument::loadHTMLFile(): Unexpected end tag : script in http://example.com. However, I can not disclose URL. – Dan Kanze Jun 06 '12 at 20:33
Per your updates: DOMDocument::loadHTML(): Unexpected end tag : script in Entity, line: 19 – Dan Kanze Jun 06 '12 at 20:38
loadhtmlfile is not now in my code. I was just pointing out that saveHTML() accepts an optional parameter to limit it to a specific domnode. saveHTML() should be called from the DOMDocument, with the domnode you want to save as the optional parameter. – Motomotes Jun 06 '12 at 20:40
if you can't give url, can you post html in pastebin? – Motomotes Jun 06 '12 at 20:41

score 1 · Answer 3 · answered Jun 06 '12 at 20:20

1

i'm not sure but i remember once i wanted to use this i was unbale to load some external url as file because the php.ini directve allow-url-fopen was set to off ...

So check your pnp.ini or try to open url with fopen to see if you can read the url as a file

<?php
$f = file_get_contents(url);
var_dump($f); // just to see the content
?>

Regards;

mimiz

answered Jun 06 '12 at 20:20

mimiz

2,178
2
14
14

see edits bc im using a curl call to retrieve string successfully. – Dan Kanze Jun 06 '12 at 20:30

score 0 · Answer 4 · answered Jun 06 '12 at 20:19

0

Try this:

$dom= new DOMDocument();
$dom->loadHTMLFile('http://example.com/');
$data = $dom->getElementById("profile_section_container")->item(0);
$html = $data->saveHTML();
echo $html;

answered Jun 06 '12 at 20:19

see edits bc im using a curl call to retrieve string successfully. – Dan Kanze Jun 06 '12 at 20:30

score 0 · Answer 5 · answered Jun 06 '12 at 20:38

0

i think that now you can use DOMDocument::loadHTML Maybe you should try Doctype existence (with a regexp) and then add it if necessary, for being sure to have it declare ... Regards

Mimiz

answered Jun 06 '12 at 20:38

mimiz

2,178
2
14
14

how would i go about doing this? – Dan Kanze Jun 06 '12 at 20:45

PHP retrieve inner HTML as string from URL using DOMDocument

5 Answers5