I wrote a code to curl the website content like facebook and google+ ,
$html = file_get_contents_curl($url);
if ($html) {
//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
//get and display what you need:
$title = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++) {
$meta = $metas->item($i);
if ($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
}
........
I got the title of page by $title = $nodes->item(0)->nodeValue;
but I need to fetch the title of news or content title(my url always not news websites) ,I dont want to restrict myself to some site, I want get the title of content of websites.
as example return Guantanamo must close during Obama's term - Russian Foreign Ministry in http://rt.com/news/guantanamo-closure-russia-dolgov-245/
France to shed ‘Amélie’ image after 50 years of China ties in http://www.france24.com/en/20140127-france-seeks-shed-amelie-image-50-years-after-opening-ties-with-china/
Defining Moments: Capturing our changing world in http://edition.cnn.com/2013/05/01/world/defining-moments/index.html?hpt=hp_bn3
I know usual way is fetch H1
or H2
tags but I need fetch title of some sites that not implement title of news with those and use <div>
tag
as example
http://www.mehrnews.com/detail/News/2222373
update I test this link in google+ and some another url that the headline is not in <h?>
tags, and google return Title correctly, any body know how it work?
` tag in the page, though that's less reliable
` tags) , You right ,thanks for your attention and replies. :)
– Yuseferi Jan 28 '14 at 05:43