Help on preg_match pattern

Question

I want to parse a html content that have something like this:

<div id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>

I need to catch just the "Lorem<br> <b>Ipsun</b>" inside the first div. How can I achieve this?

Ps: the html inside the first div have multiple lines, its an article.

Thanks

@kemp, this very much parsing HTML, and not to be done with regular expressions. — maček, Apr 06 '10 at 15:47
I just see a string matching. People on Stackoverflow get immediately blinded as soon as they see a regex question involving angle brackets. — Matteo Riva, Apr 06 '10 at 15:54

score 4 · Answer 1 · edited May 23 '17 at 10:27

4

Trying to use regex to parse HTML is not a very nice experience as HTML isn't a regular language. An alternative would be to use a HTML parser like Simple HTML DOM or the DOM library/

Simple HTML DOM Example:

$html = str_get_html('<div id="sometext">Lorem<br> <b>Ipsun</b></div><span>content</span><div id="block">lorem2</div>');
echo $html->find('div[id=sometext]', 0)->innertext;

edited May 23 '17 at 10:27

Community

1
1

answered Apr 06 '10 at 15:34

Yacoby

54,544
15
116
120

@Yacoby, thanks for recommending this library. I think it's great and it solves the OP's issue with the snap of a finger :) – maček Apr 06 '10 at 15:48

score 0 · Accepted Answer · answered Apr 06 '10 at 15:33

0

Assuming that the id is known:

preg_match('#<div id="sometext">(.*?)</div>#s', $text, $match);

answered Apr 06 '10 at 15:33

Matteo Riva

24,728
12
72
104

Would not work if the `div` has more attributes than only `id`. – Felix Kling Apr 06 '10 at 15:36
It will also not work if div changes to `
`, so? I stick to the question.
– Matteo Riva Apr 06 '10 at 15:37

Help on preg_match pattern

2 Answers2