1

I need to find a way to replace all the <p> within all the <blockquote> before the <hr />.

Here's a sample html:

<p>2012/01/03</p>
<blockquote>
    <h4>File name</h4>
    <p>Good Game</p>
</blockquote>
<blockquote><p>Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>

Here's what I got:

    $pieces = explode("<hr", $theHTML, 2);
    $blocks = preg_match_all('/<blockquote>(.*?)<\/blockquote>/s', $pieces[0], $blockmatch); 

    if ($blocks) { 
        $t1=$blockmatch[1];
        for ($j=0;$j<$blocks;$j++) {
            $paragraphs = preg_match_all('/<p>/', $t1[$j], $paragraphmatch);
            if ($paragraphs) {
                $t2=$paragraphmatch[0]; 
                for ($k=0;$k<$paragraphs;$k++) { 
                    $t1[$j]=str_replace($t2[$k],'<p class=\"whatever\">',$t1[$j]);
                }
            }
        } 
    } 

I think I'm really close, but I don't know how to put back together the html that I just pieced out and modified.

Rywek
  • 143
  • 10

2 Answers2

1

You could try using simple_xml, or better DOMDocument (http://www.php.net/manual/en/class.domdocument.php) before you make it a valid html code, and use this functionality to find the nodes you are looking for, and replace them, for this you could try XPath (http://w3schools.com/xpath/xpath_syntax.asp).

Edit 1:

Take a look at the answer of this question:

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
khael
  • 2,600
  • 1
  • 15
  • 36
  • Well, what I'm trying to do is correct thousands of entries in a MySQLdatabase/Drupal that all start in this same pattern. My logic was to use php to get all the entries and replace all the tags by first ridding all the

    and inline styling, then add a class to all the

    in the blockquotes, and finally removing the blockquotes. I made it work with the code below by adding a while(preg_match) but there is still the case that if there's a

    with no

    in it. Only happens in a couple hundred cases but still happens. I'll take a look at your solutions and hopefully find something.

    – Rywek Jan 04 '12 at 16:33
0
$string = explode('<hr', $string);
$string[0] = preg_replace('/<blockquote>(.*)<p>(.*)<\/p>(.*)<\/blockquote>/sU', '<blockquote>\1<p class="whatever">\2</p>\3</blockquote>', $string[0]);
$string = $string[0] . '<hr' . $string[1];

output:

<p>2012/01/03</p>
<blockquote>
    <h4>File name</h4>
    <p class="whatever">Good Game</p>
</blockquote>
<blockquote><p class="whatever">Laurie Ipsumam</p></blockquote>
<h4>Some title</h4>
<hr />
<p>Lorem Ipsum</p>
<blockquote><p>Laurel Ipsucandescent</p></blockquote>
popthestack
  • 496
  • 3
  • 7
  • blast, just noticed that didn't get the first

    tag.

    – popthestack Jan 03 '12 at 23:36
  • you do have an ugly regex, maybe it is not such a good idea to teach people that regex can be used to parse html, even if in this particular case it might work – khael Jan 04 '12 at 00:51
  • Yeah. It won't work if there's more than one

    tag in a

    . The complexity grows too quickly.
    – popthestack Jan 04 '12 at 15:41
  • I had made it work with a while(preg_match) to just repeat the code you gave, but now I need to find a way to add a

    – Rywek Jan 04 '12 at 16:36
  • I'm not sure what you mean by "add a

    something` should become `

    something

    `?
    – popthestack Jan 05 '12 at 03:17
  • That's pretty much it. But I think I should do a number more tutorials on regex and alternative solutions before continuing, I really wouldn't want to make things worse with all these pages. – Rywek Jan 05 '12 at 15:03