2

Having a xml file so formed:

<chapter id="1">
  <text line="1"> <p>HTML content 1</p> </text>
  <text line="2"> <q>HTML<q> content 2 </text>
  <text line="3"> HTML <b>content 3<b> </text>
</chapter>

Using DOMDocument, what query i can use for get all content associated to <chapter id="1">...</chapter> with HTML tag included? Having so as output something as:

<p>HTML content 1</p>
<q>HTML<q> content 2
HTML <b>content 3<b>

PS: As from note, i think which question ask something of different. Just i ask if is possible and how process a content inside a node ignoring html-tag if present when not is possible modify original xml.

Marcello Impastato
  • 2,263
  • 5
  • 30
  • 52
  • 1
    Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – RST Nov 11 '16 at 10:21

1 Answers1

0

Your xml string is not valid, you must convert content in text node to htmlEntities first, example:

$textContent = htmlentities($text);

After that, we have:

$xmlText = '<chapter id="1">
  <text line="1"> &lt;p&gt;HTML content 1&lt;/p&gt; </text>
  <text line="2"> &lt;q&gt;HTML&lt;q&gt; content 2 </text>
  <text line="3"> HTML &lt;b&gt;content 3&lt;b&gt; </text>
</chapter>';

Now we just need to use SimpleXMLElement to parse:

$xmlObject = new SimpleXMLElement($xmlText);
$items = $xmlObject->xpath("text");
foreach ($items as $item){
    echo html_entity_decode($item);
}

Update 1

In case you can't change your XML string, you need to use regex instead of htmlDom:

function get_tag_contents( $tag, $xml ) {
    preg_match_all( "#<$tag .*?>(.*?)</$tag>#", $xml, $matches );

    return $matches[1];
}

$invalidXml = '<chapter id="1">
  <text line="1"> <p>HTML content 1</p> </text>
  <text line="2"> <q>HTML<q> content 2 </text>
  <text line="3"> HTML <b>content 3<b> </text>
</chapter>';

$textContents = get_tag_contents( 'text', $invalidXml );

foreach ( $textContents as $content ) {
    echo $content;
}
Jared Chu
  • 2,757
  • 4
  • 27
  • 38