1

First I'll show you a sample of the code I'm working with:

<div class="entry">
        <p>Any HTML content could go here!</p>
      </div>
    </div><!--/post -->

Normally I'd use a regex rule such as the following to look for a prefix and a suffix and grab everything in between:

(?<=<div class="entry">).*(?=</div><!--/post -->)

However, that doesnt appear to be working as it seems to be pulling the white space in between then following parts instead of the HTML content itself:

<div class="entry">
        <p>

Any help/suggestions would be much appreciated as I've been bashing my head with this one for a good few hours now.

Many thanks in advance.

Karl B
  • 13
  • 3
  • I should also note, the HTML content between "
    " and "
    " is multi-line.
    – Karl B Apr 20 '11 at 07:57
  • possible duplicate of [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html) – Gordon Apr 20 '11 at 08:04

2 Answers2

7

Don't use Regex to parse HTML. You need an Xml Parser or similar.

Search Stackoverflow for the best one, like so: Robust and Mature HTML Parser for PHP

Community
  • 1
  • 1
Rob Stevenson-Leggett
  • 35,279
  • 21
  • 87
  • 141
  • Thankyou, that nudge in the right direction was much appreciated. – Karl B Apr 20 '11 at 08:01
  • Would this work for grabbing multiple instances of the above desired HTML? I was planning to use the expression with preg_match_all to grab the lot and put it into an array ready for insert to a database. – Karl B Apr 20 '11 at 09:53
  • +1 Nice answer and response from OP - not everyone appreciates an answer of 'NO!' – amelvin Aug 25 '11 at 13:26
-1

You can also consider php strip_tags().

Jatin Dhoot
  • 4,294
  • 9
  • 39
  • 59