3

I`ve come across a problem when serializing special characters like TAB, linefeed and carriage return as an attribute value.

According to this http://www.w3.org/TR/1999/WD-xml-c14n-19991109.html#charescaping, these should be encoded as &\#x9;, &\#xA;, and &\#xD; respectively. But calling in chrome:

var root = new DOMParser().parseFromString('<root></root>', 'text/xml').documentElement;
root.setAttribute('a', 'first\nsecond');
var serialized = new XMLSerializer().serializeToString(root);

Gives a string < root a="first\nsecond"/> with the linefeed not escaped.

When loading that again:

var loaded = new DOMParser().parseFromString(serialized, 'text/xml').documentElement;
loaded.getAttribute('a');

returns "first second" and the linefeed was lost to just a space. Has anyone faced this issue before? Any help would be appreciated.

Thanks,

Viktor

MrCode
  • 63,975
  • 10
  • 90
  • 112
Viktor
  • 521
  • 1
  • 4
  • 17

1 Answers1

0

I ran into this problem, and solved it by writing a function removeInvalidCharacters(xmlNode) that removes invalid characters (from nodeValues) in the XML tree. You can use it before serializing to ensure you don't get invalid characters.

You can find removeInvalidCharacters() in my stackoverflow question on the same topic

You can use removeInvalidCharacters() like this:

var stringWithSTX = "Bad" + String.fromCharCode(2) + "News";
var xmlNode = $("<myelem/>").attr("badattr", stringWithSTX);

var serializer = new XMLSerializer();
var invalidXML = serializer.serializeToString(xmlNode);

// Now cleanse it:
removeInvalidCharacters(xmlNode);
var validXML = serializer.serializeToString(xmlNode);

I've also filed an issue report against chrome, but its worth noting that IE9 has its own bugs in this department, so a fix w/o a workaround is probably a long time coming.

Community
  • 1
  • 1
Seth
  • 2,712
  • 3
  • 25
  • 41
  • Hey Seth, Thanks for the answer. Looking at your solution it removes the characters. I actually need them. I used a different approach observing the characters are not escaped, but are kept the same: `var serializer = new XMLSerializer(); var invalidXML = serializer.serializeToString(xmlNode); var xml = escapeXMLCharacters(invalidXML ); ` Luckily I don`t need to deserialize in the end. This solution works for IE9/8 as well. Bear in mind though, that is escapes only linefeeds, linebreaks and tabs, I did not need more... – Viktor Feb 13 '13 at 14:43
  • Yup! If that's what you need, its a great way to do it. The problem on IE9/8 is that it programmatically entitizes ALL out-of-range characters. Unfortunately, 0x2 and other extended chacters not valid, even in entity form, i.e. is not valid. So you if you need to be able to store ANY string handed to you without producing a corrupt XML file (my goal), you have to escape before IE9/8 entitizes. – Seth Feb 14 '13 at 15:24