PHP preg_match_all + str_replace

Question

I need to find a way to replace all the <p> within all the <blockquote> before the <hr />.

Here's a sample html:

<p>2012/01/03</p>
<blockquote>
    <h4>File name</h4>
    <p>Good Game</p>
</blockquote>
<blockquote><p>Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>

Here's what I got:

    $pieces = explode("<hr", $theHTML, 2);
    $blocks = preg_match_all('/<blockquote>(.*?)<\/blockquote>/s', $pieces[0], $blockmatch); 

    if ($blocks) { 
        $t1=$blockmatch[1];
        for ($j=0;$j<$blocks;$j++) {
            $paragraphs = preg_match_all('/<p>/', $t1[$j], $paragraphmatch);
            if ($paragraphs) {
                $t2=$paragraphmatch[0]; 
                for ($k=0;$k<$paragraphs;$k++) { 
                    $t1[$j]=str_replace($t2[$k],'<p class=\"whatever\">',$t1[$j]);
                }
            }
        } 
    }

I think I'm really close, but I don't know how to put back together the html that I just pieced out and modified.

score 1 · Accepted Answer · edited May 23 '17 at 10:34

1

You could try using simple_xml, or better DOMDocument (http://www.php.net/manual/en/class.domdocument.php) before you make it a valid html code, and use this functionality to find the nodes you are looking for, and replace them, for this you could try XPath (http://w3schools.com/xpath/xpath_syntax.asp).

Edit 1:

Take a look at the answer of this question:

RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 10:34

Community

1
1

answered Jan 03 '12 at 22:13

khael

2,600
1
15
36

Well, what I'm trying to do is correct thousands of entries in a MySQLdatabase/Drupal that all start in this same pattern. My logic was to use php to get all the entries and replace all the tags by first ridding all the
and inline styling, then add a class to all the
in the blockquotes, and finally removing the blockquotes. I made it work with the code below by adding a while(preg_match) but there is still the case that if there's a
with no
in it. Only happens in a couple hundred cases but still happens. I'll take a look at your solutions and hopefully find something.
– Rywek Jan 04 '12 at 16:33

popthestack · Answer 2 · 2012-01-04T00:00:52.237

0

$string = explode('<hr', $string);
$string[0] = preg_replace('/<blockquote>(.*)<p>(.*)<\/p>(.*)<\/blockquote>/sU', '<blockquote>\1<p class="whatever">\2</p>\3</blockquote>', $string[0]);
$string = $string[0] . '<hr' . $string[1];

output:

<p>2012/01/03</p>
<blockquote>
    <h4>File name</h4>
    <p class="whatever">Good Game</p>
</blockquote>
<blockquote><p class="whatever">Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>

edited Jan 04 '12 at 00:00

answered Jan 03 '12 at 23:35

popthestack

496
3
7

blast, just noticed that didn't get the first
tag.
– popthestack Jan 03 '12 at 23:36
you do have an ugly regex, maybe it is not such a good idea to teach people that regex can be used to parse html, even if in this particular case it might work – khael Jan 04 '12 at 00:51
Yeah. It won't work if there's more than one
tag in a
. The complexity grows too quickly.
– popthestack Jan 04 '12 at 15:41
I had made it work with a while(preg_match) to just repeat the code you gave, but now I need to find a way to add a
– Rywek Jan 04 '12 at 16:36
I'm not sure what you mean by "add a
something` should become `
something
`? – popthestack Jan 05 '12 at 03:17
That's pretty much it. But I think I should do a number more tutorials on regex and alternative solutions before continuing, I really wouldn't want to make things worse with all these pages. – Rywek Jan 05 '12 at 15:03

PHP preg_match_all + str_replace

2 Answers2

and inline styling, then add a class to all the

Linked