2

I am using this code to store my class:

FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
serializer.Serialize(stream, myClass);
stream.Close();

This writes a file that I can read alright with XmlSerializer.Deserialize. The generated file, however, is not a proper text file. XmlSerializer.Serialize doesn't store a BOM, but still inserts multibyte characters. Thus it is implicitely declared an ANSI file (because we expect an XML file to be a text file, and a text file without a BOM is considered ANSI by Windows), showing ö as ö in some editors.

Is this a known bug? Or some setting that I'm missing?

Here is what the generated file starts with:

<?xml version="1.0"?>
<SvnProjects xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

The first byte in the file is hex 3C, i.e the <.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • This has already been answered before: https://stackoverflow.com/questions/2437666/write-text-files-without-byte-order-mark-bom – Corniel Nobel Jul 04 '19 at 09:30

2 Answers2

4

Having or not having a BOM is not a definition of a "proper text file". In fact, I'd say that the most typical format these days is UTF-8 without BOM; I don't think I've ever seen anyone actually use the UTF-8 BOM in real systems! But: if you want a BOM, that's fine: just pass the correct Encoding in; if you want UTF-8 with BOM:

using (var writer = XmlWriter.Create(myPath, s_settings))
{
    XmlSerializer serializer = new XmlSerializer(typeof(MyClass));
    serializer.Serialize(writer, obj);
}

with:

static readonly XmlWriterSettings s_settings =
    new XmlWriterSettings { Encoding = new UTF8Encoding(true) };

The result of this is a file that starts EF-BB-BF, the UTF-8 BOM.

If you want a different encoding, then just replace new UTF8Encoding with whatever you did want, remembering to enable the BOM.

(note: the static Encoding.UTF8 instance has the BOM enabled, but IMO it is better to be very explicit here if you specifically intend to use a BOM, just like you should be very explicit about what Encoding you intended to use)


Edit: the key difference here is that Serialize(Stream, object) ends up using:

XmlTextWriter xmlWriter = new XmlTextWriter(stream, encoding: null) {
    Formatting = Formatting.Indented,
    Indentation = 2
};

which then ends up using:

public StreamWriter(Stream stream) : this(stream,
    encoding: UTF8NoBOM, // <==== THIS IS THE PROBLEM
    bufferSize: 1024, leaveOpen: false)
{
}

so: UTF-8 without BOM is the default if you use that API.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
1
  1. you must xml an instance not a class definition
  2. for getting Unicode you must declare a XmlWriter or TextWriter
FileStream stream = new FileStream(myPath, FileMode.Create);
XmlSerializer serializer = new XmlSerializer(typeof(myClass));
XmlWriter writer = new XmlTextWriter(fs, Encoding.Unicode);
serializer.Serialize(writer, myClass);
stream.Close();
m r
  • 21
  • 1
  • Yes, thank you. This was the main issue. I suspected the XmlWriter to write a proper text file, as XML is text. But the XmlWriter relies on the underlying stream/writer to properly write text, so an explicit text writer must be used. I keep Marc Gravell's answer accepted, because he shows what is happening behind the scenes, and I can only accept one answer. +1 anyway. – Thorsten Kettner Jul 04 '19 at 12:42