0

I am working on a project where we are serializing a class into an XML string. The requirements state that the encoding should be UTF-8, but they want characters with an ASCII values > 127 to be decimal encoded. Plus, they don't want entity encoding (&, <, and so on); they want those characters to be decimal encoded too.

Currently, we are doing the standard UTF-8 encoding and "pre-encoding" the special characters when we place them in the object. That means that any ampersands from that encoding are going to be encoded when we serialize so we have an extra step to undo that encoding.

I've found a way to create an encoding class that inherits from Encoding and overrides the GetBytes method. It worked great when I ran it standalone, but when I used it in the XmlWriterSettings, it doesn't call the method that I overrode. Instead, I get a 501 error with the exception message "No data is available for encoding 0. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method." The doc for Encoding.RegisterProvider says that is it available since .Net 4.6, but I'm using 4.5.2.

Is there a way override the encoding so I can manually encode the attribute and element values?

R. White
  • 11
  • 2
  • Once you create the xml you can convert to a string and use StreamWriter to save to file. The StreamWriter can then use you custom encoding. – jdweng Oct 17 '16 at 16:12
  • Take a look here: http://stackoverflow.com/questions/22394441/how-do-i-xmldocument-save-to-encoding-us-ascii-with-numeric-character-entiti – Rubens Farias Oct 17 '16 at 16:14
  • 1
    Are these requirements valid? That is, do they add value to the users? Most people would answer, no; That's why the .NET base class libraries don't do that intrinsically. If you remove the "no named character entity references" requirement, you could save the document as ASCII and, if necessary, do a textual edit on the XML file to change the declared encoding to UTF-8 (because the _resulting_ bytes will read the same in UTF-8 as in ASCII). – Tom Blodget Oct 17 '16 at 23:45
  • BTW-Instead of ASCII value, you must have meant Unicode codepoint. XML characters are Unicode, regardless of the document encoding. – Tom Blodget Oct 17 '16 at 23:47
  • Thanks to everyone for their help. Instead of ASCII value, how about I say, "Integer value of the character?" That would be more accurate, I guess. As for the requirements, valid or not, they are the interpretation the architects have made of the industry standards documentation. – R. White Oct 19 '16 at 00:19
  • @jdweng, this is sensitive data for a web service so I don't think saving it to a file, even temporarily, would be a allowed. – R. White Oct 19 '16 at 00:34
  • @TomBlodget - I will keep your ASCII encoding idea in my hip pocket if my current idea isn't accepted. I agree that the "no entity encoding allowed" rule seems misplaced. – R. White Oct 19 '16 at 00:34
  • 1
    Then save data to a string using StringWriter or to a MemoryStream. – jdweng Oct 19 '16 at 06:37

1 Answers1

0

Here's a solution I've worked out. I created class that inherits from XmlTextWriter and overrode the WriteString method to run the string thru the Encoding class I created before writing it to the stream. This gives me the decimal encoding I need. Unfortunately, the ampersand from the decimal encoding is encoded (&#38 becomes &) so I have to use Replace to change it back. Other than that, it does the encoding in the manner I want. I've presented it to my team and the architects so we'll see what they say.

R. White
  • 11
  • 2