0

I have a HTML like follow:

<pre>
  <code>
   some code
   <div></div>
  </code>
  <ul>
     <li>1</li>
  </ul>
</pre>
others

And parse it via DOMDocument.

After I run this:

$doc = new DOMDocument();
$doc->loadHTML($html);
echo $doc->saveHTML();

The ul element was removed out of pre element:

<pre>
  <code>
   some code
   <div></div>
  </code>
</pre>
<ul>
 <li>1</li>
</ul>
others

Why and How to keep it the same?

Please see demo for detail.

Chuoke
  • 3
  • 4

1 Answers1

0

Why?

According to the specification, a UL element cannot belong to the content of a PRE element.

Permitted content: Phrasing content

Phrasing content is a subset of flow content that defines the text and the markup it contains, and can be used everywhere flow content is expected. Runs of phrasing content make up paragraphs.

Elements belonging to this category are <abbr>, <audio>, <b>, <bdo>, <br>, <button>, <canvas>, <cite>, <code>, <command>, <data>, <datalist>, <dfn>, <em>, <embed>, <i>, <iframe>, <img>, <input>, <kbd>, <keygen>, <label>, <mark>, <math>, <meter>, <noscript>, <object>, <output>, <picture>, <progress>, <q>, <ruby>, <samp>, <script>, <select>, <small>, <span>, <strong>, <sub>, <sup>, <svg>, <textarea>, <time>, <u>, <var>, <video>, <wbr> and plain text (not only consisting of white spaces characters).

This can be seen in the warning message after calling $doc->loadHTML($html):

Warning: DOMDocument::loadHTML(): Unexpected end tag : pre in Entity

demo

How to keep it the same?

If you still need to work only with a fragment of the DOM structure that does not meet the specification, use the createDocumentFragment and appendXML functions:

$doc = new DOMDocument();
$docFragment = $doc->createDocumentFragment();
$docFragment->appendXML($html);
$doc->appendChild($docFragment);
echo $doc->saveHTML();

demo

id'7238
  • 2,428
  • 1
  • 3
  • 11