2

I'm trying to convert an HTML block which is basically in the following form (each list item should be on one line, so there shouldn't be any lines containing <ul><li> if you see what I mean):

<ul>
<li>Item 1</li>
<li>
<ul>
<li>Item 2</li>
<li>Item 3</li>
</ul>
</li>
<li>Item 4</li>
</ul>

But it could be several layers deep. I basically want to convert it to a multidimensional array, where the contents are the value (the actual contents are a bit more detailed, but I should be able to process these details). Where the output array is pretty much like the below:

$array[0]['value'] = "item 1";
$array[1][0]['value'] = "item 2";
$array[1][1]['value'] = "item 3";
$array[2]['value'] = "item 4";
MrJ
  • 1,910
  • 1
  • 16
  • 29
  • 1
    Would you be willing to use an external library such as PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/)? it will make your life very easy. – Ayush Jan 26 '12 at 10:03
  • see [How to parse and process HTML with PHP](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php) – Gordon Jan 26 '12 at 10:11
  • 2
    @xbonez SimpleHTMLDOM stinks. – Gordon Jan 26 '12 at 10:11
  • @Gordon What's the reason you say that? I'm using it in a project right now that scrapes and parses roughly 30,000 HTML pages, and I've had no issues with it. It's very stable reasonably fast. – Ayush Jan 26 '12 at 10:14
  • @xbonez you wont say that anymore once you compared it's memory usage and speed against PHP's native DOM extension or XMLReader ;) and have a look at it's source code if your in for some serious scare. – Gordon Jan 26 '12 at 10:15
  • I've surprisingly not had any memory issues. I profiled my script earlier when I was having memory issues (turned out it was a PHP CGI curl bug). Now, my script runs for roughly 6 hours with a constant 11MB memory footprint. Wouldn't the XML reader only work in case of valid HTML? What if the HTML is invalid? – Ayush Jan 26 '12 at 10:20
  • 1
    Ideally just a couple of functions would be better to keep it light – MrJ Jan 26 '12 at 10:21
  • 2
    @xbonez libxml has a parser module for html which allows broken html to be parsed. Im not sure if that is accessible from XMLReader but it is from DOM. DOM requires more memory than XMLReader though because it parses the XML into a tree structure in memory. Should still be faster and use less memory than SimpleHtmlDom though. Give it a try on your next project :) – Gordon Jan 26 '12 at 10:27

2 Answers2

2

This is the answer if anyone comes across this later...

function ul_to_array($ul){
        if(is_string($ul)){
            if(!$ul = simplexml_load_string($ul)) {
                trigger_error("Syntax error in UL/LI structure");
                return FALSE;
            }
            return ul_to_array($ul);
        } else if(is_object($ul)){
            $output = array();
            foreach($ul->li as $li){
                $update_with = (isset($li->ul)) ? ul_to_array($li->ul) : (($li->count()) ? $li->children()->asXML() : (string) $li);
                if(is_string($update_with)){
                    if(trim($update_with) !== "" && $update_with !== null){
                        $output[] = $update_with;
                    }
                } else {
                        $output[] = $update_with;
                }
            }
            return $output;
        } else {
            return FALSE;
        }
    }
MrJ
  • 1,910
  • 1
  • 16
  • 29
0

The easiest way to accomplish this is with a recursive function, like so:

//output a multi-dimensional array as a nested UL
function toUL($array){
    //start the UL
    echo "<ul>\n";
       //loop through the array
    foreach($array as $key => $member){
        //check for value member
        if(isset($member['value']) ){
            //if value is present, echo it in an li
            echo "<li>{$member['value']}</li>\n";
        }
        else if(is_array($member)){
            //if the member is another array, start a fresh li
            echo "<li>\n";
            //and pass the member back to this function to start a new ul
            toUL($member);
            //then close the li
            echo "</li>\n";
        }
    }
    //finally close the ul
    echo "</ul>\n";
}

Pass your array to that function to have it output the way you want.

Hope that helps!

Regards, Phil,

Philthi
  • 95
  • 5