1

I'm attempting to use PHP DOM with help parsing an HTML file that I want to translate into JSON. However, unfortunately the HTML DOM is fairly flat (and I have no way to change that). By flat I mean the structure is something like this:

<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title</h2>
<span>child node</span>
<span>another child</span>
<h2>title</h2>
<span>child node</span>
<span>another child</span>

I need to be able to get the <h2>'s and treat the <span>'s as children. I'm not completely set on using PHP DOM if there's a better alternative, it's simply what I found in an answer I came across, so please feel free to suggest anything. What I really need is to serve this HTML string into JSON, and PHP DOM looks like my best bet thus far.

Community
  • 1
  • 1
Ruben Martinez Jr.
  • 3,199
  • 5
  • 42
  • 76

1 Answers1

0
$XML =<<<XML
    <h2>title</h2>
    <span>child node</span>
    <span>another child</span>
    <h2>title</h2>
    <span>child node</span>
    <span>another child</span>
    <h2>title </h2>
    <span>child node</span>
    <span>another child</span>
XML;

    $dom = new DOMDocument;
    $dom->loadHTML($XML);
    $xp = new DOMXPath($dom);

    $new = new DOMDocument;
    $root = $new->createElement('root');

    foreach($xp->query('/html//*/node()') as $i => $node) {
        if ($node->nodeType == XML_TEXT_NODE)
            continue;
        if ($node->nodeName == 'h2') {
            if(isset($current))
                $root->appendChild($current);
            $current = $new->createElement('div');
            $current->appendChild($new->importNode($node, true));
            continue;
        }
        $current->appendChild($new->importNode($node, true));
    }

    $new->appendChild($root);
    $xml2 = simplexml_load_string($new->saveHTML());
    echo json_encode($xml2);
Federico
  • 3,650
  • 4
  • 32
  • 45