8

To load XML files with arbitrary encoding I have the following code:

Encoding encoding;
using (var reader = new XmlTextReader(filepath))
{
    reader.MoveToContent();
    encoding = reader.Encoding;
}

var settings = new XmlReaderSettings { NameTable = new NameTable() };
var xmlns = new XmlNamespaceManager(settings.NameTable);
var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default, 
    encoding);
using (var reader = XmlReader.Create(filepath, settings, context))
{
    return XElement.Load(reader);
}

This works, but it seems a bit inefficient to open the file twice. Is there a better way to detect the encoding such that I can do:

  1. Open file
  2. Detect encoding
  3. Read XML into an XElement
  4. Close file
Yi Jiang
  • 49,435
  • 16
  • 136
  • 136
Peter Lillevold
  • 33,668
  • 7
  • 97
  • 131

2 Answers2

9

Ok, I should have thought of this earlier. Both XmlTextReader (which gives us the Encoding) and XmlReader.Create (which allows us to specify encoding) accepts a Stream. So how about first opening a FileStream and then use this with both XmlTextReader and XmlReader, like this:

using (var txtreader = new FileStream(filepath, FileMode.Open))
{
    using (var xmlreader = new XmlTextReader(txtreader))
    {
        // Read in the encoding info
        xmlreader.MoveToContent();
        var encoding = xmlreader.Encoding;

        // Rewind to the beginning
        txtreader.Seek(0, SeekOrigin.Begin);

        var settings = new XmlReaderSettings { NameTable = new NameTable() };
        var xmlns = new XmlNamespaceManager(settings.NameTable);
        var context = new XmlParserContext(null, xmlns, "", XmlSpace.Default,
                 encoding);

        using (var reader = XmlReader.Create(txtreader, settings, context))
        {
            return XElement.Load(reader);
        }
    }
}

This works like a charm. Reading XML files in an encoding independent way should have been more elegant but at least I'm getting away with only one file open.

Peter Lillevold
  • 33,668
  • 7
  • 97
  • 131
  • Would just calling the [XmlReaderCreate(Stream)](http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.create.aspx) overload work the same way in terms of detecting the encoding? – petr k. Feb 11 '13 at 14:12
  • @petrk. - I'm using XmlTextReader explicitly since that's the class providing the `Encoding` property. Not sure what else you had in mind? – Peter Lillevold Feb 11 '13 at 16:35
  • Right, let me explain. It seems that `XElement.Load(XmlReader.Create(new FileStream(filepath, FileMode.Open)))` should do the some thing (disposing resources omitted for brevity). The documentation for [XmlReader.Create(Stream)](http://msdn.microsoft.com/en-us/library/756wd7zs.aspx) says: _The XmlReader scans the first bytes of the stream looking for a byte order mark or other sign of encoding. When encoding is determined, the encoding is used to continue reading the stream, and processing continues parsing the input as a stream of (Unicode) characters._ I was wondering if your explicit – petr k. Feb 11 '13 at 17:49
  • encoding detection is any different from what XmlReader.Create(Stream) overload does. – petr k. Feb 11 '13 at 17:49
  • @petrk. interesting... I'm sure I had a situation back then where `XmlReader` alone didn't work and I had to specify the encoding explicitly via the parser context to make it work. I should have recorded more of my scenario here because now I cannot remember all the details :) – Peter Lillevold Feb 12 '13 at 08:05
  • I am in the exact same situation, also having something similar to your sample in my codebase. I remember trying a lot of things before getting to that solution, but now it seems I could have just used the most straightforward way instead. Not sure if there's a risk of breaking anything, since I have a lot of code depending on this. – petr k. Feb 12 '13 at 09:14
  • @petrk. - only way to be sure is to build some test cases with files of various encoding. – Peter Lillevold Feb 13 '13 at 14:24
0

Another option, quite simple, is to use Linq to XML. The Load method automatically reads the encoding from the xml file. You can then get the encoder value by using the XDeclaration.Encoding property. An example from MSDN:

// Create the document
XDocument encodedDoc16 = new XDocument(
new XDeclaration("1.0", "utf-16", "yes"),
new XElement("Root", "Content")
);
encodedDoc16.Save("EncodedUtf16.xml");
Console.WriteLine("Encoding is:{0}", encodedDoc16.Declaration.Encoding);
Console.WriteLine();

// Read the document
XDocument newDoc16 = XDocument.Load("EncodedUtf16.xml");
Console.WriteLine("Encoded document:");
Console.WriteLine(File.ReadAllText("EncodedUtf16.xml"));
Console.WriteLine();
Console.WriteLine("Encoding of loaded document is:{0}", newDoc16.Declaration.Encoding);

While this may not server the original poster, as he would have to refactor a lot of code, it is useful for someone who has to write new code for their project, or if they think that refactoring is worth it.

Teorist
  • 86
  • 8