Grabbing title of a website using DOM

Question

Possible Duplicates:
Get title of website via link
How do I extract title of a website?

How can a website's title be grabbed using PHP DOM? (Which is the best way to grab it using PHP?)

score 18 · Answer 1 · answered May 03 '11 at 13:17

18

You can use the getElementByTagName() since there is technically only a single title attribute in your html so you can just grab the first one you come across in DOM.

$title = '';
$dom = new DOMDocument();

if($dom->loadHTMLFile($urlpage)) {
    $list = $dom->getElementsByTagName("title");
    if ($list->length > 0) {
        $title = $list->item(0)->textContent;
    }
}

answered May 03 '11 at 13:17

John Cartwright

5,109
22
25

Beat me by 10 seconds and gave a slightly better example. Removed my answer :) – Erik May 03 '11 at 13:19
1

@Erik Cheers. Unfortunately, SO feels like a race sometimes. – John Cartwright May 03 '11 at 13:23

score 7 · Answer 2 · answered May 03 '11 at 13:20

7

Suppresses any parsing errors from incorrect HTML or missing elements:

<?

$doc = new DOMDocument();
@$doc->loadHTML(@file_get_contents("http://www.washingtonpost.com"));

// find the title
$titlelist = $doc->getElementsByTagName("title");
if($titlelist->length > 0){
  echo $titlelist->item(0)->nodeValue;
 }

answered May 03 '11 at 13:20

Femi

64,273
8
118
148

`loadHTMLFile` already incorporates file_get_contents and does not give errors on malformed HTML so any errors it does produce would be valuable. `loadHTML` also does not give errors on malformed HTML. – Erik May 03 '11 at 13:23
Well, when I use `$doc->loadHTMLFile("http://www.washingtonpost.com"); ` right now I get a bunch of errors that say *Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: htmlParseEntityRef: expecting ';' in http://www.washingtonpost.com, line: 52 in /var/www/test/test2.php on line 5*. Maybe its my PHP version, but... – Femi May 03 '11 at 13:36
1

Your right - my apologies. I was remembering that it will parse it, but it does indeed show warnings. However, the @ suppression method is still a bad choice. You'd be better off setting `libxml_use_internal_errors(true);` so you could access the error data if you wanted/needed to – Erik May 03 '11 at 13:42
Point: this was quick and dirty. It IS a little lazy using the error suppression, and I wasn't aware it was using libxml under the hood. – Femi May 03 '11 at 13:44

Grabbing title of a website using DOM

2 Answers2

Linked

Related