1

I would like to search and replace some tags with regexp.

this is my starting string:

<p>some bla bla bla</p>
<p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p>other bla bla bla</p>
<p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p>other bla bla bla</p>

and this is the result that I want

<p>some bla bla bla</p>
<ul><li>bla bla and bla</li><li>bla bla and bla</li>
<li>bla bla and bla</li><li>bla bla and bla</li><li>bla bla and bla</li></ul>
<p>other bla bla bla</p>
<ul><li>bla bla and bla</li><li>bla bla and bla</li>
<li>bla bla and bla</li><li>bla bla and bla</li>
<li>other bla bla bla</li></ul>

So I want to substitute all <p>• or <p>&bull; with <li> and </p> with </li> and regroup every group of <li></li><li></li><li></li> in <ul></ul>

For now I have done some test and the code below is the result, but I don't think is the best way, and the regroup part isn't complete.

Searching and Replace

// base string
$test = '<p>some bla bla bla</p>
  <p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p>other bla bla bla</p>
  <p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p>other bla bla bla</p>';
// First replace, I don't know but I can't find any • or &bull; with regexp
$text = str_replace(array('•', '&bull;'), '!SUB!', $text);
$regexp = '/(<p( class="normale"){0,}>(!SUB!))(.*?)<\/p>/';
// replace bulled paragraph with li tags
$text = preg_replace($regexp, "<li>$4</li>\n", $text);

But the part that regroup what I have found is very hard, and I don't know how to proceed

Massimo
  • 553
  • 7
  • 24
  • 2
    Use an XML parser instead. [link1](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) [link2](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg) – Colin Hebert Apr 17 '12 at 15:42
  • so your's hint is to htmlentities all the bull, and parse the string in a xml parser, searching foreach paragraph that have •, replacing with li and move all into a parent called ul. Writed in some words – Massimo Apr 17 '12 at 16:05