5

In my code, I have

$document = DomDocument->loadHTML($someHTML);
$xPath = new DOMXPath($document);
//
//do some xpath query and processing
//
$result = $document->saveHTML();

The html I am processing contains  :

<html>
<body>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal;text-autospace:none"><b><span style='font-size:9.0pt;font-family:"ArialNarrow","sans-serif";
color:red'>&nbsp;</span></b></p>
</body>
</html>

and results in:

<html>
<body>
<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:
normal;text-autospace:none"><b><span style='font-size:9.0pt;font-family:"ArialNarrow","sans-serif";
color:red'> </span></b></p>
</body>
</html>

How do I prevent &nbsp; from getting converted to blank space?

hakre
  • 193,403
  • 52
  • 435
  • 836
ltfishie
  • 2,917
  • 6
  • 41
  • 68
  • What kind of xpath and processing are you doing that the entity get's removed? Are you using normalize space or something similar? – hakre May 30 '12 at 13:36
  • 1
    Probably related: [PHP DOMNode entities and nodeValue](http://stackoverflow.com/questions/2752434/php-domnode-entities-and-nodevalue) – hakre Jun 12 '12 at 11:31

2 Answers2

4
$someHTML = str_replace ('&nbsp;', '@nbsp;', $someHTML);
$document = DomDocument->loadHTML($someHTML);
$xPath = new DOMXPath($document);
//
//do some xpath query and processing
//
$result = $document->saveHTML();
$result = str_replace ('@nbsp;', '&nbsp;', $result);
ltfishie
  • 2,917
  • 6
  • 41
  • 68
0

replace &nbsp; with &amp;nbsp; then when the htmlDom doc is read it will return &nbsp;

AMayer
  • 355
  • 1
  • 7
  • The result is that &nbsp; is left on the page. – ltfishie Apr 12 '12 at 02:07
  • 1
    Good iidea though. I end up doing two replace to make it work. Replace   with @nbsp; at the begining, and replace @nbsp; with   at the end. – ltfishie Apr 12 '12 at 02:20
  • Can you explain and suggest an alternative? – ltfishie Jun 04 '12 at 15:03
  • Well actually you have found out that it does not work (because it did not replace that). Alternative: I was not able so far to create an example, but you are looking for [`DOMEntityReference`](http://php.net/domentityreference.construct), either manually or via xpath ([which might not be possible](http://www.w3.org/TR/DOM-Level-3-XPath/xpath.html#EntityReferences)). – hakre Jun 07 '12 at 14:24
  • Also if the output is UTF-8 the "space" you see might be a non-breaking space. You should provide a hex-dump of your output to further look into it. – hakre Jun 12 '12 at 10:33