0

I want to parse a html content that have something like this:

<div id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>

I need to catch just the "Lorem<br> <b>Ipsun</b>" inside the first div. How can I achieve this?

Ps: the html inside the first div have multiple lines, its an article.

Thanks

rizidoro
  • 13,073
  • 18
  • 59
  • 86

2 Answers2

4

Trying to use regex to parse HTML is not a very nice experience as HTML isn't a regular language. An alternative would be to use a HTML parser like Simple HTML DOM or the DOM library/

Simple HTML DOM Example:

$html = str_get_html('<div id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>');
echo $html->find('div[id=sometext]', 0)->innertext;
Community
  • 1
  • 1
Yacoby
  • 54,544
  • 15
  • 116
  • 120
  • @Yacoby, thanks for recommending this library. I think it's great and it solves the OP's issue with the snap of a finger :) – maček Apr 06 '10 at 15:48
0

Assuming that the id is known:

preg_match('#<div id="sometext">(.*?)</div>#s', $text, $match);
Matteo Riva
  • 24,728
  • 12
  • 72
  • 104