0

I found this question that is similar to mine, I am also signing a XML document with a library that uses XmlDocument and I am having the same problem.

My question now is why and if possible, how to avoid it. I am not even using the Load method but replacing InnerXML trying to avoid the parser, with no result.

string XML = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
                "<HtmlEncode>" +
                "<accents>" + HttpUtility.HtmlEncode("áéíóúÁÉÍÓÚ") + "</accents>" +
                "<tilde>" + HttpUtility.HtmlEncode("ñÑ") + "</tilde>" +
                "<specialChar>" + HttpUtility.HtmlEncode("&") + "</specialChar>" +
                "<text>" + HttpUtility.HtmlEncode("Pérez & Compañía") + "</text>" +
                "</HtmlEncode>";

txtOriginal.Text = XML;

System.Xml.XmlDocument xDoc = new System.Xml.XmlDocument();
xDoc.InnerXml = XML;

txtEncoded.Text = xDoc.InnerXml;

original XML

<?xml version="1.0" encoding="UTF-8"?>
<HtmlEncode>
    <accents>&#225;&#233;&#237;&#243;&#250;&#193;&#201;&#205;&#211;&#218;</accents>
    <tilde>&#241;&#209;</tilde>
    <specialChar>&amp;</specialChar>
    <text>P&#233;rez &amp; Compa&#241;&#237;a</text>
</HtmlEncode>

XML after the parser

<?xml version="1.0" encoding="UTF-8"?>
<HtmlEncode>
    <accents>áéíóúÁÉÍÓÚ</accents>
    <tilde>ñÑ</tilde>
    <specialChar>&amp;</specialChar>
    <text>Pérez &amp; Compañía</text>
</HtmlEncode>

Can anyone point me in the right direction?

For now, I am manually replacing the accents and tildes with the characters without them but I would rather have the correct ones encoded.

Elder
  • 1
  • 1
  • HTML Is different to XML, and they have different reserved characters. Check [this related question](https://stackoverflow.com/questions/21758345/what-are-the-official-xml-reserved-characters) IMHO you should not use HTMLEncode. You could Aldo use CData and avoid all the problem – Cleptus Mar 04 '20 at 07:13
  • Why do you even need the entities? XmlDocument will look at your encoding (here UTF-8) and check if any character you used is supported by the encoding. Unsupported or reserved characters get escaped via entities. As UTF-8 supports all characters there is no need to escape them. – ckuri Mar 04 '20 at 07:48
  • Thank you @bradbury9 I was not aware about the fact that HTML and XML had different reserved characters. I will try with CData to see if that helps. – Elder Mar 05 '20 at 19:47
  • @ckuri there is a weird behavior with the signing whenever a special character is used. The document gets signed using the certificate without a problem, but after sending it to the provider's web service, it gets rejected because the document sign is unable to be verified. Using normal 0-1, a-z, A-Z characters, there is no problem. I thing it could have something to do with the encoding, but I have no idea why the service would not understand that I am using UTF-8. – Elder Mar 05 '20 at 19:53

0 Answers0