-2

I have 2 versions of a website:

One that I no longer have ftp/code access to, and a nearly identical one that I do have ftp/code access to but that has been stripped of title and description tags. Is there a way (PHP or otherwise) to crawl the site I no longer have direct access to, and extract title/description meta tags for all pages?

I want to insert those tags into the new version of the site I do have access to.

Chris Cummings
  • 1,538
  • 2
  • 24
  • 39
  • 2
    [**StackOverflow**](http://bit.ly/4Agih5) is **NOT !**, a place to ask someone for free `codes`. [Such Questions are **Not Good** for this site](http://bit.ly/dcqznq), and will be [**Closed**](http://bit.ly/18T95z1), or [**Deleted**](http://bit.ly/10c3VuR), *Instead* [Learn what type](http://bit.ly/r0ZSEc) of questions you can or should ask. If you have any question about this, feel free to ask on [Meta](http://bit.ly/SgO5J), Or check the [FAQ](http://bit.ly/18T95z1), page for general information. – samayo Jul 11 '13 at 19:43
  • what about downloading the whole site via [wget](http://en.wikipedia.org/wiki/Wget)? – Alex Shesterov Jul 11 '13 at 19:45
  • I know how to download the whole site...I'm trying to find a way to get a "report" or XML or some such that ONLY has the file name, title and description. I'm not asking for the code. I'm asking for any thoughts on how to do it ("is there a way" not "gimme the code to...") – Chris Cummings Jul 11 '13 at 19:47
  • You could also use PhantomJS (http://phantomjs.org) along with CasperJS (http://casperjs.org). Those are headless browsers that allow you to crawl your website fairly easy and extract all the information you want – ILikeTacos Jul 11 '13 at 19:47
  • 1
    you can look into this link:http://stackoverflow.com/questions/3711357/get-title-and-meta-tags-of-external-site – saran banerjee Jul 11 '13 at 19:48

1 Answers1

1

You can use this to extract the meta description from a page:

$xpath = new DOMXPath($doc);
$description = $xpath->query('/html/head/meta[name@="description"]/@content');

This is an alternative solution:

$doc = new DOMDocument;
$doc->loadHTMLFile('http://example.com');

$title = $doc->getElementsByTagName('title');
$title = $title[0];

$metas = $doc->getElementsByTagName('meta');

foreach ($metas as $meta) {
  if (strtolower($meta->getAttribute('name')) == 'description') {
    $description = $meta->getAttribute('value');
  }
}

Source: #6113716

Community
  • 1
  • 1
Amal Murali
  • 75,622
  • 18
  • 128
  • 150