XmlTextWriter serialization problem

Question

I'm trying to create a piece of xml. I've created the dataclasses with xsd.exe. The root class is MESSAGE.

So after creating a MESSAGE and filling all its properties, I serialize it like this:

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
StringWriter sw = new StringWriter();
serializer.Serialize(sw, response);
string xml = sw.ToString();

Up until now all goes well, the string xml contains valid (UTF-16 encoded) xml. Now I like to create the xml with UTF-8 encoding instead, so I do it like this:

Edit: forgot to include the declaration of the stream

serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
using (MemoryStream stream = new MemoryStream())
{
    XmlTextWriter xtw = new XmlTextWriter(stream, Encoding.UTF8);
    serializer.Serialize(xtw, response);
    string xml = Encoding.UTF8.GetString(stream.ToArray());
}

And here comes the problem: Using this approach, the xml string is prepended with an invalid char (the infamous square).
When I inspect the char like this:

char c = xml[0];

I can see that c has a value of 65279.
Anybody has a clue where this is coming from?
I can easily solve this by cutting off the first char:

xml = xml.SubString(1);

But I'd rather know what's going on than blindly cutting of the first char.

Anybody can shed some light on this? Thanks!

See: http://stackoverflow.com/questions/955611/xmlwriter-to-write-to-a-string-instead-of-to-a-file/955698#955698 — Marc Gravell, Jun 09 '09 at 13:09

Chris W. Rea · Accepted Answer · 2009-06-09T13:26:03.483

17

Here's your code modified to not prepend the byte-order-mark (BOM):

var serializer = new XmlSerializer(typeof(Xsd.MESSAGE));
Encoding utf8EncodingWithNoByteOrderMark = new UTF8Encoding(false);
XmlTextWriter xtw = new XmlTextWriter(stream, utf8EncodingWithNoByteOrderMark);
serializer.Serialize(xtw, response);
string xml = Encoding.UTF8.GetString(stream.ToArray());

edited Jun 09 '09 at 13:26

answered Jun 09 '09 at 13:18

Chris W. Rea

5,430
41
58

1

`XmlTextWriter` has been [deprecated by Microsoft](https://msdn.microsoft.com/en-us/library/system.xml.xmltextwriter.aspx), so nowadays I would do `var xtw = XmlWriter.Create(stream, new XmlWriterSettings { Encoding = utf8EncodingWithNoByteOrderMark });` instead. – dbc Apr 21 '18 at 17:13

score 7 · Answer 2 · edited May 23 '17 at 12:19

7

65279 is the Unicode byte order mark - are you sure you're getting 65249? Assuming it really is the BOM, you could get rid of it by creating a UTF8Encoding instance which doesn't use a BOM. (See the constructor overloads for details.)

However, there's an easier way of getting UTF-8 out. You can use StringWriter, but a derived class which overrides the Encoding property. See this answer for an example.

edited May 23 '17 at 12:19

Community

1
1

answered Jun 09 '09 at 13:12

Jon Skeet

1,421,763
867
9,128
9,194

I ran the code and got 65279, too. Probably a typo in the question. – Chris W. Rea Jun 09 '09 at 13:19
I don't find creating a new class necessarily *easier*... what I would find easier is that I could *set* the Encoding of a StringWriter without having to derive from it. – fretje Jun 09 '09 at 13:34
@fretje: Yes, but deriving a new class is easier than changing the .NET framework :) And the point about deriving a new class being easier than using XmlTextWriter is that you only have to do it in one place, ever. – Jon Skeet Jun 09 '09 at 13:54
@Jon: Agreed. I'll take this approach if I ever need this a second time in the same project ;-) – fretje Jun 09 '09 at 14:20

XmlTextWriter serialization problem

2 Answers2

Linked