Regular expression to match block of HTML

Question

First I'll show you a sample of the code I'm working with:

<div class="entry">
        <p>Any HTML content could go here!</p>
      </div>
    </div><!--/post -->

Normally I'd use a regex rule such as the following to look for a prefix and a suffix and grab everything in between:

(?<=<div class="entry">).*(?=</div><!--/post -->)

However, that doesnt appear to be working as it seems to be pulling the white space in between then following parts instead of the HTML content itself:

<div class="entry">
        <p>

Any help/suggestions would be much appreciated as I've been bashing my head with this one for a good few hours now.

Many thanks in advance.

I should also note, the HTML content between "
" and "
" is multi-line. — Karl B, Apr 20 '11 at 07:57
possible duplicate of [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html) — Gordon, Apr 20 '11 at 08:04

score 7 · Accepted Answer · edited May 23 '17 at 12:18

7

Don't use Regex to parse HTML. You need an Xml Parser or similar.

Search Stackoverflow for the best one, like so: Robust and Mature HTML Parser for PHP

edited May 23 '17 at 12:18

Community

1
1

answered Apr 20 '11 at 07:55

Rob Stevenson-Leggett

35,279
21
87
141

Thankyou, that nudge in the right direction was much appreciated. – Karl B Apr 20 '11 at 08:01
Would this work for grabbing multiple instances of the above desired HTML? I was planning to use the expression with preg_match_all to grab the lot and put it into an array ready for insert to a database. – Karl B Apr 20 '11 at 09:53
+1 Nice answer and response from OP - not everyone appreciates an answer of 'NO!' – amelvin Aug 25 '11 at 13:26

score -1 · Answer 2 · answered Apr 20 '11 at 08:54

-1

You can also consider php strip_tags().

answered Apr 20 '11 at 08:54

Jatin Dhoot

4,294
9
39
59

Regular expression to match block of HTML

2 Answers2

Linked