103

I'm trying to parse some XML inside a WiX installer. The XML would be an object of all my errors returned from a web server. I'm getting the error in the question title with this code:

XmlDocument xml = new XmlDocument();
try
{
    xml.LoadXml(myString);
}
catch (Exception ex)
{
    System.IO.File.WriteAllText(@"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
    throw ex;
}

myString is this (as seen in the output of text.txt)

<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>

text.txt comes out looking like this:

<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>

Data at the root level is invalid. Line 1, position 1.

I need this XML to parse so I can see if I had any errors.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Chris
  • 2,619
  • 6
  • 27
  • 34
  • 1
    @marc_s: can a `string` ever really be UTF-8? What if the processing instruction (first line) is removed before the load? – John Saunders Jul 22 '13 at 19:00
  • 2
    If I take your code and compile and run it, I get no errors. But that is maybe because I fill myString hardcoded. How does your myString get set? If it comes from another file or stream there might be something annoying as a Byte-Order-Mark at the top of the file. It is usually not shown by editors (unless they have a Hex-mode). – Richard Jul 22 '13 at 19:12
  • It appears to parse without that first line. Let me make sure that the errors are able to be handled that way. Sorry it's taking so long. Every time I want to test I have to rebuild the entire WiX installer. – Chris Jul 22 '13 at 19:18
  • @Richard - It's coming from a service call from a remote server. – Chris Jul 22 '13 at 19:19
  • If you have anything in the file above this: `` then remove it from the file and try again. – Jo Smo May 08 '14 at 15:15

12 Answers12

173

The hidden character is probably BOM. The explanation to the problem and the solution can be found here, credits to James Schubert, based on an answer by James Brankin found here.

Though the previous answer does remove the hidden character, it also removes the whole first line. The more precise version would be:

string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xml.StartsWith(_byteOrderMarkUtf8))
{
    xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}

I encountered this problem when fetching an XSLT file from Azure blob and loading it into an XslCompiledTransform object. On my machine the file looked just fine, but after uploading it as a blob and fetching it back, the BOM character was added.

Community
  • 1
  • 1
Amit Merin
  • 1,856
  • 2
  • 11
  • 6
  • 3
    Not sure and I guess I'll have to keep looking, but when I do this _byteOrderMarkUtf8 = "". so it doesn't catch it. Ideas? – user1040975 May 06 '16 at 17:24
  • 2
    tried it, did not help. xml is coming from db for that matter – John Demetriou Oct 24 '17 at 09:24
  • 1
    Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()) evaluates to an empty string – Mister Cook Apr 29 '18 at 13:45
  • 8
    Had the same issues as the above commenters. Using `xmlStartsWith(byteOrderMarkUtf8, StringComparison.Ordinal)` did the trick for me. Credit to Hans Passant: https://stackoverflow.com/a/19495964/38425 – Polshgiant Jul 26 '18 at 21:52
  • Thank you a lot for this answer. I had this problem when taking the XML from Outlook attachment, because for some reason it changes the Encoding to Utf-8-BOM. No method I found of converting the encodings worked. Only by removeing that first phantom character did it work. Thanks again – Cubelaster Feb 18 '19 at 07:45
  • If your loading files, read file into string first. Else you get "XmlDocument does not contain a definition for 'StartsWith' error. – dval Mar 25 '19 at 09:50
  • It helped me with '?' (invisible) right before the `` declaration. – lexeme Apr 04 '19 at 06:34
  • 2
    This solved the issue for me, thank you VERY much, I've banged my head on this for awhile now. – mknopf Apr 10 '19 at 17:14
  • I was in the same boat as @mknopf. BOMs should be illegal. – Andrew Keeton Feb 05 '20 at 18:23
  • Function doesn't work with VB .net 3.5 (requires index & count) – Rowan Berry Feb 07 '20 at 06:16
  • I'd also recommend replacing the .Remove method with .Substring for improved performance. `int bomLength = _byteOrderMarkUtf8.Length;` `xml = xml.Substring(bomLength, xml.Length - bomLength);` – user3424480 Apr 14 '20 at 19:11
84

Use Load() method instead, it will solve the problem. See more

Ringo
  • 3,795
  • 3
  • 22
  • 37
16

The issue here was that myString had that header line. Either there was some hidden character at the beginning of the first line or the line itself was causing the error. I sliced off the first line like so:

xml.LoadXml(myString.Substring(myString.IndexOf(Environment.NewLine)));

This solved my problem.

Chris
  • 2,619
  • 6
  • 27
  • 34
  • 4
    Once I was getting this error and it turned out to a '?' at the beggining. I just replaced it with a blank space and got it running... That might also happen if the file you're reading is in a different encoding than what you're expecting – Ricardo Appleton Oct 15 '13 at 13:28
  • I tried this, but in .NETPrehistoric (1.1), I tried to use "\r\n" in place of the then-unavailable Environment.NewLine. I got, "Specified argument was out of the range of valid values." – B. Clay Shannon-B. Crow Raven Sep 04 '14 at 22:36
  • @Chris: I have tried your solution. I am getting below exception. System.ArgumentOutOfRangeException: StartIndex cannot be less than zero. Parameter – Shesha Mar 09 '16 at 08:34
12

I Think that the problem is about encoding. That's why removing first line(with encoding byte) might solve the problem.

My solution for Data at the root level is invalid. Line 1, position 1. in XDocument.Parse(xmlString) was replacing it with XDocument.Load( new MemoryStream( xmlContentInBytes ) );

I've noticed that my xml string looked ok:

<?xml version="1.0" encoding="utf-8"?>

but in different text editor encoding it looked like this:

?<?xml version="1.0" encoding="utf-8"?>

At the end i did not need the xml string but xml byte[]. If you need to use the string you should look for "invisible" bytes in your string and play with encodings to adjust the xml content for parsing or loading.

Hope it will help

pawciu
  • 855
  • 9
  • 15
4

Save your file with different encoding:

File > Save file as... > Save as UTF-8 without signature.

In VS 2017 you find encoding as a dropdown next to Save button.

MikeMajara
  • 922
  • 9
  • 23
4

Main culprit for this error is logic which determines encoding when converting Stream or byte[] array to .NET string.

Using StreamReader created with 2nd constructor parameter detectEncodingFromByteOrderMarks set to true, will determine proper encoding and create string which does not break XmlDocument.LoadXml method.

public string GetXmlString(string url)
{
    using var stream = GetResponseStream(url);
    using var reader = new StreamReader(stream, true);
    return reader.ReadToEnd(); // no exception on `LoadXml`
}

Common mistake would be to just blindly use UTF8 encoding on the stream or byte[]. Code bellow would produce string that looks valid when inspected in Visual Studio debugger, or copy-pasted somewhere, but it will produce the exception when used with Load or LoadXml if file is encoded differently then UTF8 without BOM.

public string GetXmlString(string url)
{
    byte[] bytes = GetResponseByteArray(url);
    return System.Text.Encoding.UTF8.GetString(bytes); // potentially exception on `LoadXml`
}
Nenad
  • 24,809
  • 11
  • 75
  • 93
3

I've solved this issue by directly editing the byte array. Collect the UTF8 preamble and remove directly the header. Afterward you can transform the byte[]to a string with GetString method, see below. The \r and \t I've removed as well, just as precaution.

XmlDocument configurationXML = new XmlDocument();
List<byte> byteArray = new List<byte>(webRequest.downloadHandler.data);

foreach(byte singleByte in Encoding.UTF8.GetPreamble())
{
     byteArray.RemoveAt(byteArray.IndexOf(singleByte));
}
string xml = System.Text.Encoding.UTF8.GetString(byteArray.ToArray());
       xml = xml.Replace("\\r", "");
       xml = xml.Replace("\\t", "");
  • Its work for me. But in the loop, we need to check byteArray.IndexOf(singleByte) != -1 or not before remove it. – ThanhLD Aug 19 '19 at 07:03
2

If your xml is in a string use the following to remove any byte order mark:

        xml = new Regex("\\<\\?xml.*\\?>").Replace(xml, "");
Mister Cook
  • 1,552
  • 1
  • 13
  • 26
2

At first I had problems escaping the "&" character, then diacritics and special letters were shown as question marks and ended up with the issue OP mentioned.

I looked at the answers and I used @Ringo's suggestion to try Load() method as an alternative. That made me realize that I can deal with my response in other ways not just as a string.

using System.IO.Stream instead of string solved all the issues for me.

var response = await this.httpClient.GetAsync(url);
var responseStream = await response.Content.ReadAsStreamAsync();
var xmlDocument = new XmlDocument();
xmlDocument.Load(responseStream);

The cool thing about Load() is that this method automatically detects the string format of the input XML (for example, UTF-8, ANSI, and so on). See more

tibbiustin
  • 91
  • 1
  • 6
0

I have found out one of the solutions. For your code this could be as follows -

XmlDocument xml = new XmlDocument();
try
{
    // assuming the location of the file is in the current directory 
    // assuming the file name be loadData.xml
    string myString = "./loadData.xml";
    xml.Load(myString);
}
catch (Exception ex)
{
    System.IO.File.WriteAllText(@"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
    throw ex;
}
  • It is a solution but bad one. This is Encoding issue, by writing and reading file, you actually performed encoding and decoding without being aware,since invoked overload of the Load method have default value for Encoding parameter (System.Text.Encoding encoding) – hardyVeles Jul 01 '19 at 18:35
  • thank you sir for pointing it out, could you please correct me? – Shubhasish Bhunia Jul 03 '19 at 19:52
  • You should decode and encode the String, using methods of the Encoding class, there is no need (and sense) to use File methods or file system at all. Please, check: https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding?view=netframework-4.8 – hardyVeles Jul 11 '19 at 15:15
0

Using an XmlDataDocument object is much better than using an XDocument or XmlDocument object. XmlDataDocument works fine with UTF8 and it doesn't have problems with Byte Order Sequences. You can get the child nodes of each element using ChildNodes property. Use a custom function such as the following one:

        static public void ReadXmlDataDocument2(string xmlFilePath)
    {
        
        if (xmlFilePath != null)
        {
            if (File.Exists(xmlFilePath))
            {
                System.IO.FileStream fs = default(System.IO.FileStream);
                try
                {
                    fs = new System.IO.FileStream(xmlFilePath, System.IO.FileMode.Open, System.IO.FileAccess.Read);
                    System.Xml.XmlDataDocument k_XDoc = new System.Xml.XmlDataDocument();
                    k_XDoc.Load(fs);
                    fs.Close();
                    fs.Dispose();
                    fs = null;

                    XmlNodeList ndsRoot = k_XDoc.ChildNodes;
                    foreach (System.Xml.XmlNode xLog in ndsRoot)
                    {
                        foreach (System.Xml.XmlNode xLog2 in xLog.ChildNodes)
                        {
                            if (xLog2.Name == "ERRORs")
                            {
                                foreach (System.Xml.XmlNode xLog3 in xLog2.ChildNodes)
                                {
                                    if (xLog3.Name == "ErrorCode")
                                    {
                                        // Do something
                                    }
                                    if (xLog3.Name == "Description")
                                    {
                                        // Do something
                                    }
                                }
                            }
                        }
                    }

                }
                catch (Exception ex)
                {
                    MessageBox.Show(ex.Message);
                }
            }
        }
    }
Meisam Rasouli
  • 301
  • 2
  • 5
-1

if we are using XDocument.Parse(@""). Use @ it resolves the issue.

Raj
  • 59
  • 1
  • 10