31

I'm opening an existing XML file with C#, and I replace some nodes in there. All works fine. Just after I save it, I get the following characters at the beginning of the file:

  (EF BB BF in HEX)

The whole first line:

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

The rest of the file looks like a normal XML file. The simplified code is here:

XmlDocument doc = new XmlDocument();
doc.Load(xmlSourceFile);
XmlNode translation = doc.SelectSingleNode("//trans-unit[@id='127']");
translation.InnerText = "testing";
doc.Save(xmlTranslatedFile);

I'm using a C# Windows Forms application with .NET 4.0.

Any ideas? Why would it do that? Can we disable that somehow? It's for Adobe InCopy, and it does not open it like this.

UPDATE: Alternative Solution:

Saving it with the XmlTextWriter works too:

XmlTextWriter writer = new XmlTextWriter(inCopyFilename, null);
doc.Save(writer);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Remy
  • 12,555
  • 14
  • 64
  • 104
  • See this [post](http://stackoverflow.com/questions/1755958/how-can-i-remove-bom-from-xmltextwriter-using-c) here - Jon Skeet explains how to use remove the BOM when saving your XMLDocument, if that is what you need. – StuartLC Jan 06 '11 at 11:39

4 Answers4

41

It is the UTF-8 BOM, which is actually discouraged by the Unicode standard:

http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf

Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature

You may disable it using:

var sw = new IO.StreamWriter(path, new System.Text.UTF8Encoding(false));
doc.Save(sw);
sw.Close();
dalle
  • 18,057
  • 5
  • 57
  • 81
  • Huh, I never knew it was discouraged... if it is, then how are programs supposed to detect encodings? – user541686 Jan 06 '11 at 11:33
  • @Lambert: XML either specifies an encoding in the header or (missing that) is UTF-8 by default. – Konrad Rudolph Jan 06 '11 at 11:41
  • 1
    @Lambert: *for UTF-8* is the key part of the phrase. If you *know* it is utf-8 then there's no point, no endian-ness trouble. The odds of reading an xml file encoded in utf-16be without a bom are still zilch, even if it is declared in the processing instruction. – Hans Passant Jan 06 '11 at 12:22
  • Thanks for all the answers. That helped. I've updated the question with another solution that I found after your input. – Remy Jan 06 '11 at 12:47
  • This code does not compile. The first argument to `StreamWriter` is a `Stream`, not a path. Also, `sw` will never be closed if `doc.Save(sw);` throws an Exception. Classic case for the `using` statement. – Eric J. Mar 26 '13 at 02:47
  • @EricJ. You are correct, this is a classic case for the `using` statement. It seems like MS has updated the documentation for `StreamWriter(String)` constructor, which now implicitly "creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM)". MS also have updated the documentation for .NET 2.0 to say the same thing. So, unless MS made a mistake, the `StreamWriter(String)` constructor should suffice. – dalle Mar 26 '13 at 11:59
6

It's a UTF-8 Byte Order Mark (BOM) and is to be expected.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
0

You can try to change the encoding of the XmlDocument. Below is the example copied from MSDN

using System; using System.IO; using System.Xml;

public class Sample {

  public static void Main() {

    // Create and load the XML document.
    XmlDocument doc = new XmlDocument();
    string xmlString = "<book><title>Oberon's Legacy</title></book>";
    doc.Load(new StringReader(xmlString));

    // Create an XML declaration. 
    XmlDeclaration xmldecl;
    xmldecl = doc.CreateXmlDeclaration("1.0",null,null);
    xmldecl.Encoding="UTF-16";
    xmldecl.Standalone="yes";     

    // Add the new node to the document.
    XmlElement root = doc.DocumentElement;
    doc.InsertBefore(xmldecl, root);

    // Display the modified XML document 
    Console.WriteLine(doc.OuterXml);

  } 

}

Enes
  • 3,951
  • 3
  • 25
  • 23
0

As everybody else mentioned, it's Unicode issue.

I advise you to try LINQ To XML. Although not really related, I mention it as it's super easy compared to old ways and, more importantly, I assume it might have automatic resolutions to issues like these without extra coding from you.

Meligy
  • 35,654
  • 11
  • 85
  • 109