2

I am trying to preserve some XML entities when parsing XML files in javascript. The following code snippet illustrates the problem. Is there a way for me to make round-trip parse and retain the XML entities (  is nbsp; html)? This happens in Chrome FF and IE10.

var aaa='<root><div>&#160;one&#160;two</div></root>'
var doc=new DOMParser().parseFromString(aaa,'application/xml')
new XMLSerializer().serializeToString(doc)
"<root><div> one two</div></root>"

The issue is I am taking some chunks out of html and storing them in xml, and then I want to get the spaces back in XML when I'm done. Edit: As Dan and others have pointed out, the parser replaces it with the ascii code 160, which to my eyes looks like an ordinary space but:

var str1=new XMLSerializer().serializeToString(doc)
str1.charCodeAt(15)
160

So where ever my application is losing the spaces, it is not here.

Aaron Newman
  • 549
  • 1
  • 5
  • 27

1 Answers1

0

You can use a ranged RegExp to turn the special chars back into xml representations. as a nice re-usable function:

function escapeExtended(s){
 return s.replace(/([\x80-\xff])/g, function (a, b) {
   var c = b.charCodeAt();
   return "&#" + b.charCodeAt()+";" 
 });
}


var aaa='<root><div>&#160;one&#160;two</div></root>'
var doc=new DOMParser().parseFromString(aaa,'application/xml')
var str= new XMLSerializer().serializeToString(doc);
alert(escapeExtended(str)); // shows: "<root><div>&#160;one&#160;two</div></root>"

Note that HTML entities (ex quot;) will lose their symbol name, and be converted to XML entities (the &#number; kind). you can't get back the names without a huge conversion table.

dandavis
  • 16,370
  • 5
  • 40
  • 36
  • I tried your solution though it doesn't seem to be working for the following entities: &10073;&9664;&11035;&9654;&10074;&9193;&9654; (# symbol removed to keep the site from parsing them). – John Aug 20 '15 at 13:51
  • @John: i believe you could just extend the rage to four places instead of two: `s.replace(/([\x0080-\xffff])/g` to handle those higher entities, but i can't test that right now... – dandavis Aug 20 '15 at 21:51