This might be a given, but I'm trying to be thorough; as .NET's string
type is UTF-16, does XmlDocument.LoadXml(string)
just simply ignore the encoding
attribute in the XML declaration, as whatever the document was encoded with should have already been converted to UTF-16 since it is contained in a .NET string?

- 287
- 5
- 13
-
why should it ignore it? if the document says utf-8, it can't be loaded as utf-16. That wouldn't work. Or am I misinterpreting your question..? – default Feb 28 '13 at 09:39
-
It should ignore it, in my opinion, because the original data, whereever it came from, should already be converted from _whatever_ to UTF-16 as it is contained in a .NET string. – Stockhausen Feb 28 '13 at 11:21
1 Answers
The XML attributes determine the encoding type.
For example
<?xml version="1.0" encoding="utf-8" ?>
This is what it is read as and then it is converted to a UTF-16 string, but if you expect to see UTF-16 characters, you will not, they will be lost.
From the MSDN documentation for String here:
Each Unicode character in a string is defined by a Unicode scalar value, also called a Unicode code point or the ordinal (numeric) value of the Unicode character. Each code point is encoded using UTF-16 encoding, and the numeric value of each element of the encoding is represented by a Char object.
This means that when you pass XmlDocument.LoadXml() your string with an XML header, it must say the encoding is UTF-16. Otherwise, the actual underlying encoding won't match the encoding reported in the header and will result in an XmlException being thrown.
Extended explanation here: Why does C# XmlDocument.LoadXml(string) fail when an XML header is included?
-
I read the question you linked before posting this question, but I for one have no problems using LoadXml() with the `encoding` attribute set as UTF-8 (nor with UTF-16). IMO it would be annoying and pointless to manually change the encoding attribute always to UTF-16, _once_ the content has been in one way or the other stored to a `string`, since that's always UTF-16. Say that you receive a UTF-8 encoded XML document as `byte[]` and use `Encoding.UTF8.GetString(byte[])`; the string will obviously be UTF-16, yet the declaration should say UTF-8, and this is why I think it should be ignored. – Stockhausen Feb 28 '13 at 11:29