Here is a problem:
I want to extract title of website. I have seen multiple implementation but none of them handled sites with multiple <title>
tags. So currently i'm using something like this to extract first (true) title:
function GetTitleFromWebSite($url)
{
$arrContextOptions=array(
"ssl"=>array(
"verify_peer"=>false,
"verify_peer_name"=>false,
),
);
$page = @file_get_contents($url, false, stream_context_create($arrContextOptions));
if ( $page )
{
$title_begin = strpos($page, "<title>");
if ( $title_begin )
{
$title_end = strpos( $page, "</title>" );
if ( $title_end )
{
$title_begin += 7;
$title = htmlentities( substr($page, $title_begin, $title_end - $title_begin) );
return $title;
}
}
}
return "";
}
I know that this isn't secure, but this is only for test and i will worry about certifications later.
Question is:
What is the best way of handling this? Something that will take care of every crazy construction? Some of the implementations handled new line in <title>
. Is there any 'nice' way of doing this?