I am using PHP's get_meta_tags()
function to get the meta tags for different webpages. I want to know what is the best way to get the contents of the <h1>
tag of a webpage. Should I use file_get_contents()
, or is there a better way?

- 119
- 3
- 13
-
2"Should I use `file_get_contents()`" --- this part has nothing to do with `h1` extraction process – zerkms Aug 02 '12 at 03:14
-
`Should I use file_get_contents()` - I would use [cURL](http://php.net/manual/en/book.curl.php). – uınbɐɥs Aug 02 '12 at 04:07
4 Answers
Yes I would use:
$page = file_get_contents('http://example.com');
$matches = array();
preg_match( '#<h1>(.*?)</h1>#', $page, $matches );
You information should be in $matches

- 3,106
- 1
- 17
- 21
-
3Someone has to link to this. :-) [Parsing HTML with RegEx](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – uınbɐɥs Aug 02 '12 at 03:17
-
1I can't up-vote. It might "work", but it can just go downhill .. I have bad experiences with other developers polluting code with stuff that "works". – Aug 02 '12 at 03:18
file_get_contents()
can work to get you the contents of the page. Once you have the contents, how you extract the h1
tag is up to you.
You could try a simple regular expression to return the contents of the first h1
tag:
$contents = file_get_contents($url);
preg_match_all("/<h1>(.*?)<\/h1>/", $contents, $matches);
$h1 = $matches[1];
However, I prefer using a DOM parser when working with HTML. The PHP Simple HTML DOM Parser is pretty easy to use. Something like:
$contents = file_get_contents($url);
$html = str_get_html($contents);
$h1 = $html->find("h1")[0];
Note: I did not test these code snippets. Just some samples to get you started.

- 9,967
- 3
- 31
- 43
The <h1>
tags aren't meta tags, so you can't use the get_meta_tags()
function. Meta tags in a HTML document are tags in the <head>
section that contain information about the page, not the content itself.
PHP.DOM is probably the best way to get the information you want. Here is a link to a decent tutorial that should get you started nicely.

- 33,228
- 16
- 67
- 80
-
2"I am using PHP's get_meta_tags() function to get the meta tags for different webpages" --- it's just a piece of irrelevant info :-) – zerkms Aug 02 '12 at 03:15
Try using Simple HTML DOM.
Code:
<?php
require_once('simple_html_dom.php');
$raw = '<h1>blah</h1>'; // Set the raw HTML of the webpage here
$html = str_get_html($raw);
$h1 = $html->find('h1', 0)->plaintext;
echo $h1;
?>

- 7,236
- 5
- 26
- 42