Fastest way to retrieve a in PHP</a></h1> </div> <div class="grid fw-wrap pb8 mb16 bb bc-black-075"> <div class="grid--cell ws-nowrap mr16 mb8" title="2016-01-12 19:07:53Z"> <span class="fc-light mr2">Asked</span> <time itemprop="dateCreated" datetime="2008-12-30T02:01:08.463" class="fromnow">Dec 30 '08 at 02:01</time> </div> <div class="grid--cell ws-nowrap mr16 mb8"> <span class="fc-light mr2">Active</span> <time class="fromnow" title="2019-03-06T00:32:58.877" datetime="2019-03-06T00:32:58.877">Mar 06 '19 at 00:32</a> </div> <div class="grid--cell ws-nowrap mb8" title="Viewed 56,132 times"> <span class="fc-light mr2">Viewed</span> 5.6k times </div> </div> <div id="mainbar" role="main" aria-label="questions and answers"> <div id="question" class="question" data-questionid="399332" data-ownerid="" data-score="31"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container grid jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="399332"> <button class="js-vote-up-btn grid--cell s-btn s-btn__unset c-pointer"><svg aria-hidden="true" class="m0 svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 26h32L18 10 2 26z"></path></svg></button> <div class="js-vote-count grid--cell fc-black-500 fs-title grid fd-column ai-center" itemprop="upvoteCount" data-value="31">31</div> <button class="js-bookmark-btn s-btn s-btn__unset c-pointer py4"> <svg aria-hidden="true" class="svg-icon iconBookmark" width="18" height="18" viewBox="0 0 18 18"><path d="M6 1a2 2 0 00-2 2v14l5-4 5 4V3a2 2 0 00-2-2H6zm3.9 3.83h2.9l-2.35 1.7.9 2.77L9 7.59l-2.35 1.7.9-2.76-2.35-1.7h2.9L9 2.06l.9 2.77z"></path></svg> <div class="js-bookmark-count mt4" data-value="0">0</div> </button> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"><p>I'm doing a bookmarking system and looking for the fastest (easiest) way to retrieve a page's title with PHP. </p> <p>It would be nice to have something like <code>$title = page_title($url)</code></p></div> <div class="mt24 mb12"> <div class="post-taglist grid gs4 gsy fd-column"> <div class="grid ps-relative"> <a href="../../questions/tagged/php" class="post-tag js-gps-track" title="show questions tagged 'php'" rel="tag">php</a> <a href="../../questions/tagged/html" class="post-tag js-gps-track" title="show questions tagged 'html'" rel="tag">html</a> <a href="../../questions/tagged/parsing" class="post-tag js-gps-track" title="show questions tagged 'parsing'" rel="tag">parsing</a> </div> </div> </div> <div class="mb0"> <div class="mt16 grid gs8 gsy fw-wrap jc-end ai-start pt4 mb16"> <div class="grid--cell mr16 fl1 w96"></div> <div class="post-signature grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="edited Sep 10 '12 at 07:37">edited Sep 10 '12 at 07:37</time> <a href="../../users/1105514/adi" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/1105514.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Adi" /> </a> <div class="s-user-card--info"> <a href="../../users/1105514/adi" class="s-user-card--link">Adi</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">5,089</li> <li class="s-award-bling s-award-bling__gold" title="6 gold badges">6</li> <li class="s-award-bling s-award-bling__silver" title="33 silver badges">33</li> <li class="s-award-bling s-award-bling__bronze" title="47 bronze badges">47</li> </ul> </div> </div> </div> <div class="post-signature owner grid--cell"> <div class="s-user-card s-user-card__deleted"> <time class="s-user-card--time" datetime="asked Dec 30 '08 at 02:01">asked Dec 30 '08 at 02:01</time> <div class="s-avatar s-avatar__32 s-user-card--avatar"> </div> <div class="s-user-card--info"></div> </div> </div> </div> </div> </div> <div class="post-layout--right js-post-comments-component"> </div> </div> </div> <div id="answers"> <a name="tab-top"></a> <div id="answers-header"> <div class="answers-subheader grid ai-center mb8"> <div class="grid--cell fl1"> <h2 class="mb0" data-answercount="9">7 Answers<span style="display:none;" itemprop="answerCount">7</span></h2> </div> </div> </div> <a name="399357"></a> <div id="answer-399357" class="answer accepted-answer" data-answerid="399357" data-ownerid="50013" data-score="55" itemprop="acceptedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container grid jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="399357"> <button class="js-vote-up-btn grid--cell s-btn s-btn__unset c-pointer"><svg aria-hidden="true" class="m0 svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 26h32L18 10 2 26z"></path></svg></button> <div class="js-vote-count grid--cell fc-black-500 fs-title grid fd-column ai-center" itemprop="upvoteCount" data-value="55">55</div> <div class="js-accepted-answer-indicator grid--cell fc-green-500 py6 mtn8"><div class="ta-center"><svg aria-hidden="true" class="svg-icon iconCheckmarkLg" width="36" height="36" viewBox="0 0 36 36"><path d="m6 14 8 8L30 6v8L14 30l-8-8v-8z"></path></svg></div></div> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"><pre><code><?php function page_title($url) { $fp = file_get_contents($url); if (!$fp) return null; $res = preg_match("/<title>(.*)<\/title>/siU", $fp, $title_matches); if (!$res) return null; // Clean up title: remove EOL's and excessive whitespace. $title = preg_replace('/\s+/', ' ', $title_matches[1]); $title = trim($title); return $title; } ?> </code></pre> <p>Gave 'er a whirl on the following input:</p> <pre><code>print page_title("http://www.google.com/"); </code></pre> <p>Outputted: Google</p> <p>Hopefully general enough for your usage. If you need something more powerful, it might not hurt to invest a bit of time into researching HTML parsers.</p> <p>EDIT: Added a bit of error checking. Kind of rushed the first version out, sorry.</p></div> <div class="mb0"> <div class="mt16 grid gs8 gsy fw-wrap jc-end ai-start pt4 mb16"> <div class="grid--cell mr16 fl1 w96"></div> <div class="post-signature grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="edited Oct 23 '11 at 09:29">edited Oct 23 '11 at 09:29</time> <a href="../../users/-1/community" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/-1.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Community" /> </a> <div class="s-user-card--info"> <a href="../../users/-1/community" class="s-user-card--link">Community</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">1</li> <li class="s-award-bling s-award-bling__silver" title="1 silver badges">1</li> </ul> </div> </div> </div> <div class="post-signature grid--cell"> <div class="s-user-card s-user-card"> <time class="s-user-card--time" datetime="answered Dec 30 '08 at 02:15">answered Dec 30 '08 at 02:15</time> <a href="../../users/50013/ed-carrel" class="s-avatar s-avatar__32 s-user-card--avatar"> <img class="s-avatar--image" src="../../users/profiles/50013.webp" data-jdenticon-width="32" data-jdenticon-height="32" data-jdenticon-value="Ed Carrel" /> </a> <div class="s-user-card--info"> <a href="../../users/50013/ed-carrel" class="s-user-card--link">Ed Carrel</a> <ul class="s-user-card--awards"> <li class="s-user-card--rep" title="reputation score">4,154</li> <li class="s-award-bling s-award-bling__gold" title="1 gold badge">1</li> <li class="s-award-bling s-award-bling__silver" title="25 silver badge">25</li> <li class="s-award-bling s-award-bling__bronze" title="17 bronze badge">17</li> </ul> </div> </div> </div> </div> </div> </div> <div class="post-layout--right js-post-comments-component"> <div id="comments-399357" class="comments js-comments-container bt bc-black-075 mt12 " data-post-id="399357" data-min-length="15"> <ul class="comments-list js-comments-list" data-remaining-comments-count="0" data-canpost="false" data-cansee="true" data-comments-unavailable="false" data-addlink-disabled="true"> <li id="comment-232689" class="comment js-comment " data-comment-id="232689" data-comment-owner-id="22844" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment232689_399357"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">I'm relatively sure that will produce an error if the pattern isn't found. Initialise $title first, assign preg_match() to a boolean and check for that before attempting to access the first element of the $title_matches array.</span> – <a href="../../users/22844/scronide" title="12,012 reputation" class="comment-user ">scronide</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/399332/fastest-way-to-retrieve-a-title-in-php#comment232689_399357"><span title="2009-01-02T19:46:45.727 License: CC BY-SA 2.5" class="relativetime-clean">Jan 02 '09 at 19:46</span></a></span> </div> </div> </li> <li id="comment-243065" class="comment js-comment " data-comment-id="243065" data-comment-owner-id="50013" data-comment-score="0"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment243065_399357"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Oh. Too right. If preg_match doesn't get a result, the reference to $title_matches will barf. Will tidy up a bit.</span> – <a href="../../users/50013/ed-carrel" title="4,154 reputation" class="comment-user ">Ed Carrel</a> <span class="comment-date" dir="ltr"><a class="comment-link" href="../../questions/399332/fastest-way-to-retrieve-a-title-in-php#comment243065_399357"><span title="2009-01-07T01:12:07.360 License: CC BY-SA 2.5" class="relativetime-clean">Jan 07 '09 at 01:12</span></a></span> </div> </div> </li> <li id="comment-55142134" class="comment js-comment " data-comment-id="55142134" data-comment-owner-id="21406" data-comment-score="4"> <div class="js-comment-actions comment-actions"> <div class="comment-score js-comment-edit-hide"> <span title="number of 'useful comment' votes received" class="warm">4</span> </div> </div> <div class="comment-text js-comment-text-and-form"> <a name="comment55142134_399357"></a> <div class="comment-body js-comment-edit-hide"> <span class="comment-copy">Facebook's title tags look like this: `<title id="pageTitle">` – Luke Nov 13 '15 at 01:55

17

You can get it without reg expressions:

$title = '';
$dom = new DOMDocument();

if($dom->loadHTMLFile($urlpage)) {
    $list = $dom->getElementsByTagName("title");
    if ($list->length > 0) {
        $title = $list->item(0)->textContent;
    }
}
Lukas Liesis
  • 24,652
  • 10
  • 111
  • 109
  • 1
    This is the first solution that works with deadspin.com – NewEndian Mar 03 '18 at 17:39
  • You may want to call `libxml_use_internal_errors(true);` before using `DOMDocument`. Unfortunately, the underlying library `DOMDocument` uses to parse the HTML (libxml) as of today still doesn't support HTML5 (it's an XML library after all) and will produce warnings for HTML5 semantic tags (e.g. `
    ` or `
    `). There doesn't seem to be an alternative to error suppression here unfortunately. See also https://stackoverflow.com/a/6090728/2459834
    – Domenico De Felice Jun 20 '18 at 13:25
12

or making this simple function slightly more bullet proof:

function page_title($url) {

    $page = file_get_contents($url);

    if (!$page) return null;

    $matches = array();

    if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) {
        return $matches[1];
    } else {
        return null;
    }
}


echo page_title('http://google.com');
Alexei Tenitski
  • 9,030
  • 6
  • 41
  • 50
7

I'm also doing a bookmarking system and found that since PHP 5 you can use stream_get_line to load the remote page only until the closing title tag (instead of loading the whole file), then get rid of what's before the opening title tag with explode (instead of a regex).

function page_title($url) {
  $title = false;
  if ($handle = fopen($url, "r"))  {
    $string = stream_get_line($handle, 0, "</title>");
    fclose($handle);
    $string = (explode("<title", $string))[1];
    if (!empty($string)) {
      $title = trim((explode(">", $string))[1]);
    }
  }
  return $title;
}

Last explode thanks to PlugTrade's answer who reminded me that title tags can have attributes.

pevinkinel
  • 391
  • 3
  • 17
5

Regex?

Use cURL to get the $htmlSource variable's contents.

preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches);

print_r($titleMatches);

see what you have in that array.

Most people say for HTML traversing though you should use a parser as regexs can be unreliable.

The other answers provide more detail :)

alex
  • 479,566
  • 201
  • 878
  • 984
1

I like using SimpleXml with regex's, this is from a solution I use to grab multiple link headers from a page in an OpenID library I've created. I've adapted it to work with the title (even though there is usually only one).

function getTitle($sFile)
{
    $sData = file_get_contents($sFile);

    if(preg_match('/<head.[^>]*>.*<\/head>/is', $sData, $aHead))
    {   
        $sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower('<$1>'), $aHead[0]);
        $xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml));

        return (string)$xTitle->head->title;
    }
    return null;
}

echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');

Ironically this page has a "title tag" in the title tag which is what sometime causes problems with the pure regex solutions.

This solution is not perfect as it lowercase's the tags which could cause a problem for the nested tag if formatting/case was important (such as XML), but there are ways that are a bit more involved around that problem.

null
  • 7,432
  • 4
  • 26
  • 28
1

A function to handle title tags that have attributes added to them

function get_title($html)
{
    preg_match("/<title(.+)<\/title>/siU", $html, $matches);
    if( !empty( $matches[1] ) ) 
    {
        $title = $matches[1];

        if( strstr($title, '>') )
        {
            $title = explode( '>', $title, 2 );
            $title = $title[1];

            return trim($title);
        }   
    }
}

$html = '<tiTle class="aunt">jemima</tiTLE>';
$title = get_title($html);
echo $title;
PlugTrade
  • 837
  • 11
  • 19