Any way to extract title and description from every page on a site?

Question

I have 2 versions of a website:

One that I no longer have ftp/code access to, and a nearly identical one that I do have ftp/code access to but that has been stripped of title and description tags. Is there a way (PHP or otherwise) to crawl the site I no longer have direct access to, and extract title/description meta tags for all pages?

I want to insert those tags into the new version of the site I do have access to.

[**StackOverflow**](http://bit.ly/4Agih5) is **NOT !**, a place to ask someone for free `codes`. [Such Questions are **Not Good** for this site](http://bit.ly/dcqznq), and will be [**Closed**](http://bit.ly/18T95z1), or [**Deleted**](http://bit.ly/10c3VuR), *Instead* [Learn what type](http://bit.ly/r0ZSEc) of questions you can or should ask. If you have any question about this, feel free to ask on [Meta](http://bit.ly/SgO5J), Or check the [FAQ](http://bit.ly/18T95z1), page for general information. — samayo, Jul 11 '13 at 19:43
what about downloading the whole site via [wget](http://en.wikipedia.org/wiki/Wget)? — Alex Shesterov, Jul 11 '13 at 19:45
I know how to download the whole site...I'm trying to find a way to get a "report" or XML or some such that ONLY has the file name, title and description. I'm not asking for the code. I'm asking for any thoughts on how to do it ("is there a way" not "gimme the code to...") — Chris Cummings, Jul 11 '13 at 19:47
You could also use PhantomJS (http://phantomjs.org) along with CasperJS (http://casperjs.org). Those are headless browsers that allow you to crawl your website fairly easy and extract all the information you want — ILikeTacos, Jul 11 '13 at 19:47
you can look into this link:http://stackoverflow.com/questions/3711357/get-title-and-meta-tags-of-external-site — saran banerjee, Jul 11 '13 at 19:48

score 1 · Accepted Answer · edited May 23 '17 at 12:05

1

You can use this to extract the meta description from a page:

$xpath = new DOMXPath($doc);
$description = $xpath->query('/html/head/meta[name@="description"]/@content');

This is an alternative solution:

$doc = new DOMDocument;
$doc->loadHTMLFile('http://example.com');

$title = $doc->getElementsByTagName('title');
$title = $title[0];

$metas = $doc->getElementsByTagName('meta');

foreach ($metas as $meta) {
  if (strtolower($meta->getAttribute('name')) == 'description') {
    $description = $meta->getAttribute('value');
  }
}

Source: #6113716

edited May 23 '17 at 12:05

Community

1
1

answered Jul 11 '13 at 19:44

Amal Murali

75,622
18
128
150

Thanks, this is going to put me in the direction I need. Hadn't looked at loadHTMLfile before. Thank you! – Chris Cummings Jul 11 '13 at 19:58

Any way to extract title and description from every page on a site?

1 Answers1