-2

I have HTML data, but I want to get a piece of this data. The top and bottom should be deleted. (everything after the H1 and above a H2 with text 'What we offer' should be put in a variable)

<p>This text can be deleted</p>
<h1>This title also</h1>

<h2>FROM THIS TITLE I WANT THE TEXT</h2><p>SAME HERE</p>
<h2>...</h2><p>...</p>

<h2>What we offer</h2>
<p>This text isn't needed</p>

I want all HTML and text beginning AFTER </h1> and ENDING at <h2>What we offer</h2> Any idea how to do this in PHP?

This does the trick without regexp (Thanks Alexandru), but I'm so curious what regexp I could use to achieve this...

$beginIndex = strpos($htmlString, "</h1>");
$endIndex = strpos($htmlString, "<h2>What we offer</h2>");
$desiredString = substr($htmlString, $beginIndex, $endIndex - $beginIndex);
screaming SiLENCE
  • 214
  • 1
  • 2
  • 11
  • Useful online regex tool: http://gskinner.com/RegExr/ – Mike de Klerk Nov 14 '12 at 13:55
  • 4
    [parse html](http://stackoverflow.com/questions/3627489/php-parse-html-code) – Pedro del Sol Nov 14 '12 at 13:56
  • 4
    [You shouldn't use Regex at all.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) [Have a look at this.](http://simplehtmldom.sourceforge.net) [If you cannot use it for some reason PHP has a built-in DOM parser.](http://www.php.net/manual/en/book.dom.php) – Martin Ender Nov 14 '12 at 13:56
  • You have mistaken near "SAME HERE

    " which should be "SAME HERE

    ". Notice the closing slash. You can use this regex: "

    (.+)

    (.+)

    " as in your example there is no line break between the "

    needed test

    needed text

    " but there is in the piece you do not want.
    – Mike de Klerk Nov 14 '12 at 13:57
  • There's nothing that sets the h2 you want apart from the h2 you don't want. Do you just want the first h2 in the page? Or all but the last? –  Nov 14 '12 at 14:02
  • I want all HTML and text beginning AFTER and ENDING at

    What we offer

    – screaming SiLENCE Nov 14 '12 at 14:03

2 Answers2

1

Given the definition what you need, this should work:

$beginIndex = strpos($htmlString, "</h1>");
$endIndex = strpos($htmlString, "<h2>What we offer</h2>");
$desiredString = substr($htmlString, $beginIndex, $endIndex - $beginIndex);
  • This does the trick indeed. But I'm so curious what regexp would do this also... It's hard to understand regexp, this would be an ideal example ;) – screaming SiLENCE Nov 14 '12 at 14:14
1

The regex solution you are requesting would look something like this:

$pattern = '/<\/h1>(.*)<h2>What we offer/s';
$matches = array();
preg_match($pattern, $htmlString, $matches);
$desiredString = $matches[1];