php preg match all, all the `p`

Question

<?php
$str= <<<ETO
<p>one
two</p>
<p>three</p>
ETO;
preg_match_all('/<p>(.*?)<\/p>/',$str,$r);
print_r($r);
?>

I am studying preg_match_all. I want get all the p from one article. but my code only get the second p. how to modify so that I can get the first p, either. Thanks.

It's just that regex is often the wrong tool to use for parsing HTML. — BoltClock, Mar 21 '11 at 10:47
look into using a HTML parser: http://stackoverflow.com/questions/3577641/best-methods-to-parse-html — Unicron, Mar 21 '11 at 10:48

score 4 · Accepted Answer · answered Mar 21 '11 at 10:47

You are missing the /ims flag at the end of your regex. Otherwise . will not match line breaks (as in your first paragraph). Actually /s would suffice, but I'm always using all three for simplicity.

Also, preg_match works for many simple cases. But if you are attempting any more complex extractions, then consider alternating to phpQuery or QueryPath which allow for:

foreach (qp($html)->find("p") as $p)  { print $p->text(); }

score 2 · Answer 2 · answered Mar 21 '11 at 10:52

2

(.*?) is not matching newline characters. Try the /s modifier:

<?php
$str= <<<ETO
<p>one 
two</p>
<p>three</p>
ETO;
preg_match_all('/<p>(.*?)<\/p>/s',$str,$r);
print_r($r);
?>

answered Mar 21 '11 at 10:52

Canuteson

598
1
4
11

php preg match all, all the `p`

2 Answers2

Related