search, replace and regroup with regexp and php

Question

I would like to search and replace some tags with regexp.

this is my starting string:

<p>some bla bla bla</p>
<p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p>other bla bla bla</p>
<p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
<p>other bla bla bla</p>

and this is the result that I want

<p>some bla bla bla</p>
<ul><li>bla bla and bla</li><li>bla bla and bla</li>
<li>bla bla and bla</li><li>bla bla and bla</li><li>bla bla and bla</li></ul>
<p>other bla bla bla</p>
<ul><li>bla bla and bla</li><li>bla bla and bla</li>
<li>bla bla and bla</li><li>bla bla and bla</li>
<li>other bla bla bla</li></ul>

So I want to substitute all • or • with <li> and  with </li> and regroup every group of <li></li><li></li><li></li> in <ul></ul>

For now I have done some test and the code below is the result, but I don't think is the best way, and the regroup part isn't complete.

Searching and Replace

// base string
$test = '<p>some bla bla bla</p>
  <p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p>other bla bla bla</p>
  <p class="normale">&bull;bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p class="normale">•bla bla and bla</p><p class="normale">&bull;bla bla and bla</p>
  <p>other bla bla bla</p>';
// First replace, I don't know but I can't find any • or &bull; with regexp
$text = str_replace(array('•', '&bull;'), '!SUB!', $text);
$regexp = '/(<p( class="normale"){0,}>(!SUB!))(.*?)<\/p>/';
// replace bulled paragraph with li tags
$text = preg_replace($regexp, "<li>$4</li>\n", $text);

But the part that regroup what I have found is very hard, and I don't know how to proceed

1. if you have multiple problems then post multiple questions. 2. please re-word your question it's impossible to know what you actually want. — Jan Hančič, Apr 17 '12 at 13:25
So... you ***want*** to generate broken HTML? (`` with no `
` because you replaced it) — Niet the Dark Absol, Apr 17 '12 at 14:13
Use an XML parser instead. [link1](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) [link2](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg) — Colin Hebert, Apr 17 '12 at 15:42
so your's hint is to htmlentities all the bull, and parse the string in a xml parser, searching foreach paragraph that have •, replacing with li and move all into a parent called ul. Writed in some words — Massimo, Apr 17 '12 at 16:05

score 1 · Answer 1 · edited Feb 17 '14 at 20:33

I concur with @Colin; however, is the above Searching and Replace code doing what you want? i.e. is it finding the • char? If so, I'd recommend not using the !SUB! replacement, but instead just include it as part of your

regex:
/(<p( class="normale")?>(•|•))(.*?)/

If not, then you have to find the corresponding ASCII representation(in hex or octal) and put that in its place inside the regex. 

Once you've gotten this far, an XML parser would make quick work of the reordering part of it. :-)

search, replace and regroup with regexp and php

1 Answers1