1

So far I've been trying to get a simple way to stract a title from an HTML page.

This simple:

$url = "http://localhost";

Use any function to extract the title tag using only PHP functions or regular expressions, I do not want to use any external classes such as simple_html_dom or Zend_Dom... I want to do it the simple way with PHP only... can anyone post a sample code to simply extract the title tag from localhost?

I've tried using DOMdocument() class, simple_xml_parse(), and none of them with success

I tried like this:

<?php $dom = new DOMdocument(); 
$dom->loadhtml('pag.html'); 
$items = $dom->getElementsByTagName('title');
foreach ($items as $title) { echo "title"; }
Adrian Cid Almaguer
  • 7,815
  • 13
  • 41
  • 63
  • 1
    What do you mean by "stract"? – kojow7 Apr 26 '15 at 00:45
  • 1
    There is no way to automatically extract the title from an HTML page. Show us what you tried with DOMdocument and why you didn't have success. – Mike Apr 26 '15 at 00:46
  • 1
    @kojow7 I'm assuming OP meant "extract" – Mike Apr 26 '15 at 00:47
  • I tried like this: loadhtml('pag.html'); $items = $dom->getElementsByTagName('title'); foreach ($items as $title) { echo "title"; } And when I said "stract" I meant, parse –  Apr 26 '15 at 00:49
  • Whoa there, cowboy. Edit your original question. Don't put blocks of code in the comments. – Mike Apr 26 '15 at 00:56
  • 1
    Did you try and see if you are actually getting a document back? As in try to echo InnerHTML for example? – Radek Apr 26 '15 at 00:56

1 Answers1

2

With DOM:

<?php 
$doc = new DOMDocument();
$doc->loadHTML(file_get_contents("1.html"));
$items = $doc->getElementsByTagName("title");
if($items->length > 0){
  echo $items->item(0)->nodeValue;
 }
?>

With Regular Expressions:

<?php

$html = file_get_contents('1.html');
preg_match("/<title>([^<]*)<\/title>/im", $html, $matches);
echo $matches[1];

?>

1.html

<html>
<head>
    <title>This is the title</title>
</head>
<body>
<h1>Hello</h1>
</body>
</html>

Output:

This is the title
Adrian Cid Almaguer
  • 7,815
  • 13
  • 41
  • 63
  • 1
    I'm obligated to post this when I read your regex part: http://stackoverflow.com/a/1732454/811240 – Mike Apr 26 '15 at 01:41
  • @Mike there's a DOM alternative answer. – Pedro Lobito Apr 26 '15 at 02:35
  • @Mike OP answer for a regular expressions solution too, I answer with the regex as an alternative to the DOM, that is my first code in the answer – Adrian Cid Almaguer Apr 26 '15 at 03:05
  • Thank you very much Adrian Cid, this was exactly the kind of answer I was looking for, plain and simple, no extra classes outside "PHP core" required... thank you very much –  Apr 26 '15 at 06:44