1

Possible Duplicate:
Best methods to parse HTML with PHP

So I have a ton of entries in my database where lists where entered, but they're not real lists and i need to convert them to actual lists.

Here's what I have:

Other HTML data here.

<p>&ntilde; Line of data</p>
<p>&ntilde; Another line of data</p>
<p>&ntilde; Yet another line of data</p>
<p>&ntilde; Still more data</p>

More HTML data here.

Needs to change to:

Other HTML data here.

<ul>
    <li>Line of data</li>
    <li>Another line of data</li>
    <li>Yet another line of data</li>
    <li>Still more data</li>
</ul>

More HTML data here.

It doesn't have to be formatted like that, could just be all smashed together. I don't care.

Thanks.


Forgot to mention there is HTML data on both sides of the would be list.

Also I've got the SimpleDOM parser. Not really interested in getting another one, but if there's a really easy one to use that would take care of this it would be helpful.

Thanks, again.

Community
  • 1
  • 1
Tomas
  • 3,054
  • 5
  • 27
  • 39

3 Answers3

3

I'm going to get reprimands for not using a DOM parser, but here goes. This is just a simple string operation, no regex needed.

You just need to replace the <p> open/close tags with <li> open/close tags, and wrap it in <ul></ul>.

Updated Fixed to account for updates to question, stuff before & after the list...:

$original = "Stuff here

<p>&ntilde; Line of data</p>
<p>&ntilde; Another line of data</p>
<p>&ntilde; Yet another line of data</p>
<p>&ntilde; Still more data</p>

Other stuff";

// Store stuff before & after the list
$stuffbefore = substr($original, 0, stripos($original, "<p>"));
$stuffafter = substr($original, strripos($original, "</p>") + strlen("</p>"));

// Cut off the stuff before the list
$listpart = substr($original, strlen($stuffbefore));
// Cut off stuff after the list
$listpart = substr($listpart, 0, strlen($listpart) - strlen($stuffafter));

$fixed = str_replace("<p>&ntilde; ", "<li>", $listpart);
$fixed = str_replace("</p>", "</li>", $fixed);

// Stick it all back together
$fixed = "$stuffbefore\n<ul>$fixed</ul>\n$stuffafter";
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
  • +1 You get my compliments for NOT using a DOM parser in a simple task like this. – NullUserException Sep 02 '11 at 14:45
  • I've revised what I need slightly because there is data on either side of the list that needs parsed. So this solution will not work for this case, sorry. – Tomas Sep 02 '11 at 19:32
  • @Tomas that makes it quite a bit more complicated, but see above for the necessary changes. – Michael Berkowski Sep 02 '11 at 20:10
  • Yeah that's not going to do it either because there are html tags in the stuff before and after, and a lot of

    tags. So I'm not sure what direction to go.

    – Tomas Sep 02 '11 at 20:36
0

You could just use Str_replace where you replace all the <p> with <li> and all the </p> with </li>

Breezer
  • 10,410
  • 6
  • 29
  • 50
0

UPDATE: I've run in to this problem before where there's a bunch of data with 'fake' lists using indenting and different chars as the bullet so I just made this little function.

function make_real_list($regex, $content, $type="unordered"){

    preg_match_all($regex, $content, $matches);

    $matches    = $matches[0];
    $count  = sizeof($matches);

    if($type=="unordered"):
        $outer_start    = "<ul>";
        $outer_end      = "</ul>";

    else:
        $outer_start    = "<ol>";
        $outer_end      = "</ol>";

    endif;

    $i = 1;
    foreach($matches as $match):

        if($i==1):
            $replace    = preg_replace($regex, '<li>$1</li>', $match, 1);
            $match  = preg_quote($match, "/");
            $content    = preg_replace("/$match/", ($outer_start?$outer_start:'').$replace, $content);

        elseif($i==$count):
            $replace    = preg_replace($regex, '<li>$1</li>', $match, 1);
            $match  = preg_quote($match, "/");
            $content    = preg_replace("/$match/", $replace.($outer_end?$outer_end:''), $content);

        else:
            $content    = preg_replace($regex, '<li>$1</li>', $content, 1);

        endif;
        $i++;

    endforeach;

    return $content;

}

$content = "<p>STUFF BEFORE</p>
<p>&ntilde; FIRST LIST ITEM</p>
<p>&ntilde; MIDDLE LIST ITEM</p>
<p>&ntilde; LAST LIST ITEM</p>
<p>STUFF AFTER</p>";

echo make_real_list("/\<p\>&ntilde; (.*?)\<\/p\>/", $content);

//OUTPUT
<p>STUFF BEFORE</p> 
<ul>
    <li>FIRST LIST ITEM</li> 
    <li>MIDDLE LIST ITEM</li> 
    <li>LAST LIST ITEM</li>
</ul> 
<p>STUFF AFTER</p>
Tomas
  • 3,054
  • 5
  • 27
  • 39