This is the structure of a xml-file (odt-file), which I try to parse:
<office:body>
<office:text>
<text:h text:style-name="P1" text:outline-level="2">Chapter 1</text:h>
<text:p text:style-name="Standard">Lorem ipsum. </text:p>
<text:h text:style-name="Heading3" text:outline-level="3">Subtitle 2</text:h>
<text:p text:style-name="Standard"><text:span text:style-name="T5">10</text:span><text:span text:style-name="T6">:</text:span><text:s/>Text (100%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Standard">9.7:<text:s/>Text (97%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Standard"><text:span text:style-name="T9">9.1:</text:span><text:s/>Text (91%)</text:p>
<text:p text:style-name="Explanation">Further informations.</text:p>
<text:p text:style-name="Explanation">More furter informations.</text:p>
</office:text>
</office:body>
With XML-Reader I did that this way:
while ($reader->read()){
if ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:h') {
if ($reader->getAttribute('text:outline-level')=="2") $html .= '<h2>'.$reader->expand()->textContent.'</h2>';
}
elseif ($reader->nodeType == XMLREADER::ELEMENT && $reader->name === 'text:p') {
if ($reader->getAttribute('text:style-name')=="Standard") {
$html .= '<p>'.$reader->readInnerXML().'<p>';
}
else if {
// Doing something different
}
}
}
echo $html;
Now I would like to do the same thing with DOMDocument, but I need some help with the syntax. How can I loop through all children of office:text? While looping through all nodes, I would check via if/else what to do (text:h vs. text:p).
I also need to replace every text:s (if there are such elements in text:p) with a whitespace...
$reader = new DOMDocument();
$reader->preserveWhiteSpace = false;
$reader->load('zip://content.odt#content.xml');
$body = $reader->getElementsByTagName( 'office:text' )->item( 0 );
foreach( $body->childNodes as $node ) echo $node->nodeName . PHP_EOL;
Or would it be smarter to loop through all text elements? If this is the case, still the question, how to do that.
$elements = $reader->getElementsByTagName('text');
foreach($elements as $node){
foreach($node->childNodes as $child) {
echo $child->nodeName.': ';
echo $child->nodeValue.'<br>';
// check for type...
}
}