PHP regular expression, get data part

Question

I have HTML data, but I want to get a piece of this data. The top and bottom should be deleted. (everything after the H1 and above a H2 with text 'What we offer' should be put in a variable)

<p>This text can be deleted</p>
<h1>This title also</h1>

<h2>FROM THIS TITLE I WANT THE TEXT</h2><p>SAME HERE</p>
<h2>...</h2><p>...</p>

<h2>What we offer</h2>
<p>This text isn't needed</p>

I want all HTML and text beginning AFTER </h1> and ENDING at <h2>What we offer</h2> Any idea how to do this in PHP?

This does the trick without regexp (Thanks Alexandru), but I'm so curious what regexp I could use to achieve this...

$beginIndex = strpos($htmlString, "</h1>");
$endIndex = strpos($htmlString, "<h2>What we offer</h2>");
$desiredString = substr($htmlString, $beginIndex, $endIndex - $beginIndex);

[parse html](http://stackoverflow.com/questions/3627489/php-parse-html-code) — Pedro del Sol, Nov 14 '12 at 13:56
[You shouldn't use Regex at all.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) [Have a look at this.](http://simplehtmldom.sourceforge.net) [If you cannot use it for some reason PHP has a built-in DOM parser.](http://www.php.net/manual/en/book.dom.php) — Martin Ender, Nov 14 '12 at 13:56
You have mistaken near "SAME HERE
" which should be "SAME HERE
". Notice the closing slash. You can use this regex: "
(.+)
(.+)
" as in your example there is no line break between the "
needed test
needed text
" but there is in the piece you do not want. — Mike de Klerk, Nov 14 '12 at 13:57
There's nothing that sets the h2 you want apart from the h2 you don't want. Do you just want the first h2 in the page? Or all but the last? — , Nov 14 '12 at 14:02
I want all HTML and text beginning AFTER and ENDING at
What we offer — screaming SiLENCE, Nov 14 '12 at 14:03

score 1 · Answer 1 · answered Nov 14 '12 at 14:11

1

Given the definition what you need, this should work:

$beginIndex = strpos($htmlString, "</h1>");
$endIndex = strpos($htmlString, "<h2>What we offer</h2>");
$desiredString = substr($htmlString, $beginIndex, $endIndex - $beginIndex);

answered Nov 14 '12 at 14:11

This does the trick indeed. But I'm so curious what regexp would do this also... It's hard to understand regexp, this would be an ideal example ;) – screaming SiLENCE Nov 14 '12 at 14:14

score 1 · Accepted Answer · answered Nov 14 '12 at 14:44

1

The regex solution you are requesting would look something like this:

$pattern = '/<\/h1>(.*)<h2>What we offer/s';
$matches = array();
preg_match($pattern, $htmlString, $matches);
$desiredString = $matches[1];

answered Nov 14 '12 at 14:44

Great, didn't think the answer was so easy :o – screaming SiLENCE Nov 14 '12 at 15:05

PHP regular expression, get data part

(.+)

needed test

What we offer

2 Answers2