0

The following two code samples demonstrates this issue I am encountering where "invalid characters" are not encoded or decoded.

var elm = new XElement("foo", "\x12")
elm.ToString();
// ArgumentException: '', hexadecimal value 0x12, is an invalid character.

Likewise, parsing

var elm2 = XElement.Parse("<foo>&#x0012;</foo>");
// XmlException: '', hexadecimal value 0x12, is an invalid character ..

This is causes unexpected exceptions in unexpected cases -

How can I "resolve" this such that the XML is always properly encoded without exception? How can this problem be generally dealt with?

If I must preserve these "invalid characters" in a round-trip, is there a standard method of doing so without a custom encoding (eg. base64) process?

Also, I am surprised to see that using an XML entity did not fix the issue - isn't encoded encoded? Is this a difference between XML versions or merely some fundamental XML limitation?


In this case it would be OK to simply drop the invalid XML characters, but don't wish to perform the action manually for every text node inserted into the XElement structure.

This isn't an XElement only issue, although answers can rely on XElement being used, as online validation sites also reject the XML in the second case.

Community
  • 1
  • 1
user2864740
  • 60,010
  • 15
  • 145
  • 220
  • I think you might need an XCData element for these types of values. – 500 - Internal Server Error Jan 22 '15 at 17:46
  • @500-InternalServerError - you'll still get 500 internal server error even if you try to wrap invalid characters. I believe some of the classes in .Net will allow to add such text but still properly fail reading it. See http://stackoverflow.com/questions/21087648/xml-invalid-characters-when-creating-cdata-node-from-unicodestring for link to the spec. – Alexei Levenkov Jan 22 '15 at 17:51

1 Answers1

3

There is no way to make valid XML document with invalid characters which is roughly 0-31 for XML 1.0 and just 0 for XML 1.1 (but System.Xml does not support this standard). Complete list can be found in specification, or Wikipedia Valid characters in XML.

Recommended way of dealing with such information which is essentially "binary data" is to Base64 encode it.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179