1

I am using PHP's get_meta_tags() function to get the meta tags for different webpages. I want to know what is the best way to get the contents of the <h1> tag of a webpage. Should I use file_get_contents(), or is there a better way?

Ajay Mohite
  • 119
  • 3
  • 13

4 Answers4

4

Yes I would use:

$page = file_get_contents('http://example.com');
$matches = array();
preg_match( '#<h1>(.*?)</h1>#', $page, $matches );

You information should be in $matches

Krycke
  • 3,106
  • 1
  • 17
  • 21
  • 3
    Someone has to link to this. :-) [Parsing HTML with RegEx](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – uınbɐɥs Aug 02 '12 at 03:17
  • 1
    I can't up-vote. It might "work", but it can just go downhill .. I have bad experiences with other developers polluting code with stuff that "works". –  Aug 02 '12 at 03:18
1

file_get_contents() can work to get you the contents of the page. Once you have the contents, how you extract the h1 tag is up to you.

You could try a simple regular expression to return the contents of the first h1 tag:

$contents = file_get_contents($url);
preg_match_all("/<h1>(.*?)<\/h1>/", $contents, $matches);
$h1 = $matches[1];

However, I prefer using a DOM parser when working with HTML. The PHP Simple HTML DOM Parser is pretty easy to use. Something like:

$contents = file_get_contents($url);
$html = str_get_html($contents);
$h1 = $html->find("h1")[0];

Note: I did not test these code snippets. Just some samples to get you started.

Micah Carrick
  • 9,967
  • 3
  • 31
  • 43
0

The <h1> tags aren't meta tags, so you can't use the get_meta_tags() function. Meta tags in a HTML document are tags in the <head> section that contain information about the page, not the content itself.

PHP.DOM is probably the best way to get the information you want. Here is a link to a decent tutorial that should get you started nicely.

Fluffeh
  • 33,228
  • 16
  • 67
  • 80
  • 2
    "I am using PHP's get_meta_tags() function to get the meta tags for different webpages" --- it's just a piece of irrelevant info :-) – zerkms Aug 02 '12 at 03:15
0

Try using Simple HTML DOM.

Code:

<?php
require_once('simple_html_dom.php');
$raw = '<h1>blah</h1>'; // Set the raw HTML of the webpage here
$html = str_get_html($raw);
$h1 = $html->find('h1', 0)->plaintext;
echo $h1;
?>
uınbɐɥs
  • 7,236
  • 5
  • 26
  • 42