Use PHP's preg_replace() to return only the value inside the
?

Question

How would I use PHP's preg_replace() to return only the value inside the <h1> in the following string (it's HTML text loaded in a variable called $html):

<h1>I'm Header</h1>

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque tincidunt porttitor magna, quis molestie augue sagittis quis.</p>

<p>Pellentesque tincidunt porttitor magna, quis molestie augue sagittis quis. Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>

I've tried this: preg_replace('#<h1>([.*])</h1>.*#', '$1', $html), but to no avail. Am I regex-ing this correctly? And is there a better PHP function that I should be using instead of preg_replace?

Umm...just a bit of sidebar topic here: as I was typing this post (most of the way through), a weird unicorn graphic showed up on the right side of the page and, like MS Clippy, asked me if I wanted help parsing XML, and then sent me here: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 When I came back to my post to take a screenshot of the unicorn, it was gone. Somebody please tell me that wasn't a hallucination. Somebody? Anybody? Hello? — Sam, Apr 02 '12 at 02:11
that wasn't hallucination, and generally you shouldn't want parsing HTML with regexes — zerkms, Apr 02 '12 at 02:12
it was a link to a thread that explains that generally you shouldn't parse HTML using regular expressions — zerkms, Apr 02 '12 at 02:22
[I want this unicorn as a regular feature](http://meta.stackexchange.com/questions/127823/please-keep-the-aprils-1st-unicorn-for-parse-html-using-regex-questions) — stema, Apr 02 '12 at 09:04

score 4 · Answer 1 · answered Apr 02 '12 at 02:07

4

([.*]) means dot OR astersk

What you need is (.*?), which means any amount of any characters ungreedy

or

([^<]*) - which means any amount of any characters but not <

answered Apr 02 '12 at 02:07

zerkms

249,484
69
436
539

Tamik Soziev · Accepted Answer · 2012-04-02T02:18:02.117

4

Here is how you do it using preg_replace:

$header = preg_replace('/<h1>(.*)<\/h1>.*/iU', '$1', $html);

You can also use preg_match:

$matches = array();
preg_match('/<h1>(.*)</h1>.*/iU', $html, $matches);
print_r($matches);

edited Apr 02 '12 at 02:18

answered Apr 02 '12 at 02:12

Tamik Soziev

14,307
5
43
55

`.*` in the end is harmful - it would cut all the other text off – zerkms Apr 02 '12 at 02:13
but he wants to get only the h1 tag contents...he does not care about the rest. – Tamik Soziev Apr 02 '12 at 02:15
Oh, yes. It wouldn't destroy the data, but still pointless ;-) – zerkms Apr 02 '12 at 02:21
@TamikSoziev This is close. When I echo($header) I get the contents of the
successfully, but the rest of the HTML is there too. I just want to extract the guts of the
.
– Sam Apr 02 '12 at 02:22
WooHoo!!! preg_match() did it! In an effort to make me a better regexer, can you explain the "iU" part? Many thanks! – Sam Apr 02 '12 at 22:21

Use PHP's preg_replace() to return only the value inside the ?

?

2 Answers2

successfully, but the rest of the HTML is there too. I just want to extract the guts of the

.

Use PHP's preg_replace() to return only the value inside the
?