3

I have a situation that I do not quite understand. When reading the following XML:

<?xml version="1.0" encoding="utf-8" ?>
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Countries>
      <Country>
        <CountryCode>CN</CountryCode>
        <CurrentStatus>Active</CurrentStatus>
      </Country>
    </Countries>

    <Countries>
      <Country>
        <CountryCode>AU</CountryCode>
        <CurrentStatus>Cancelled</CurrentStatus>
      </Country>
      <Country>
        <CountryCode>CN</CountryCode>
        <CurrentStatus>Cancelled</CurrentStatus>
      </Country>
      <Country>
        <CountryCode>US</CountryCode>
        <CurrentStatus>Active</CurrentStatus>
      </Country>
    </Countries>

    <Countries xsi:nil="true" />
</Root>

With the following code:

//No whitespace
string xml = File.ReadAllText(fileInfo.FullName);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml);
string json1 = JsonConvert.SerializeXmlNode(xmlDoc);

//With whitespace
XmlDocument doc = new XmlDocument();
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;

using (XmlReader reader = XmlReader.Create(fileInfo.FullName, settings))
{
    while (reader.Read())
    {
        if (reader.NodeType == XmlNodeType.Element)
        {
            XmlNode node = doc.ReadNode(reader);
            string json2 = JsonConvert.SerializeXmlNode(node);
        }
    }
}

I get json that looks like this:

json1:

{"?xml":{"@version":"1.0","@encoding":"utf-8"},"Root":{"@xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","Countries":[{"Country":{"CountryCode":"CN","CurrentStatus":"Active"}},{"Country":[{"CountryCode":"AU","CurrentStatus":"Cancelled"},{"CountryCode":"CN","CurrentStatus":"Cancelled"},{"CountryCode":"JP","CurrentStatus":"Cancelled"},{"CountryCode":"SG","CurrentStatus":"Cancelled"},{"CountryCode":"US","CurrentStatus":"Active"}]},{"@xsi:nil":"true"}]}}

json2:

{"Root":{"@xmlns:xsi":"http://www.w3.org/2001/XMLSchema-instance","#whitespace":["\n ","\n ","\n ","\n"],"Countries":[{"#whitespace":["\n ","\n "],"Country":{"#whitespace":["\n ","\n ","\n
"],"CountryCode":"CN","CurrentStatus":"Active"}},{"#whitespace":["\n
","\n ","\n ","\n ","\n ","\n "],"Country":[{"#whitespace":["\n ","\n ","\n
"],"CountryCode":"AU","CurrentStatus":"Cancelled"},{"#whitespace":["\n ","\n ","\n
"],"CountryCode":"CN","CurrentStatus":"Cancelled"},{"#whitespace":["\n ","\n ","\n
"],"CountryCode":"JP","CurrentStatus":"Cancelled"},{"#whitespace":["\n ","\n ","\n
"],"CountryCode":"SG","CurrentStatus":"Cancelled"},{"#whitespace":["\n ","\n ","\n
"],"CountryCode":"US","CurrentStatus":"Active"}]},{"@xsi:nil":"true"}]}}

Why does XmlReader generate white space but XmlDocument does not? I don't think they should be there given the XML values.

Ogglas
  • 62,132
  • 37
  • 328
  • 418
  • 1
    Try `settings.IgnoreWhitespace = true;`. But basically you already have an answer. Do you really need a Reader, ie is your data > 100MB ? – H H Aug 30 '17 at 13:17
  • I don't know why XmlReader behaves that way by default, but I think you just need to set [XmlReaderSettings.IgnoreWhitespace](https://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.ignorewhitespace(v=vs.110).aspx) to true. – finrod Aug 30 '17 at 13:18
  • @HenkHolterman Thanks. I need the reader since my XML does not have a root element when I read the data and `XmlDocument` will throw an error. Since my question was about whitespace I added the root element to show the difference between `XmlReader` and `XmlDocument`. – Ogglas Aug 30 '17 at 13:24
  • And have you looked at XElement ? Usually easier to use. – H H Aug 30 '17 at 13:35
  • OK, just let us hear of that Ignore settings helps at all. – H H Aug 30 '17 at 13:48
  • https://stackoverflow.com/questions/9399850/c-sharp-whitespaces-issue-with-xmlreader – Kaarthik Aug 30 '17 at 13:58
  • @HenkHolterman Haven't tried XElement but will check it out. IgnoreWhitespac worked like a charm but it would be nice to know why it was added in the first place as well. :) – Ogglas Aug 30 '17 at 14:05
  • @GeekBoy That is a different question? – Ogglas Aug 30 '17 at 14:05

2 Answers2

1

Solved it with:

settings.IgnoreWhitespace = true;

Thanks to @HenkHolterman and @finrod.

Ogglas
  • 62,132
  • 37
  • 328
  • 418
0
 XmlDocument doc = new XmlDocument();
        doc.PreserveWhitespace = false;
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ConformanceLevel = ConformanceLevel.Document;
        settings.IgnoreWhitespace = true;
        XmlReader reader = XmlReader.Create("XMLFile1.xml", settings);
        {

            while (reader.Read())
            {
                if (reader.NodeType == XmlNodeType.Element )
                {
                    XmlNode node = doc.ReadNode(reader);
                    string json2 = JsonConvert.SerializeXmlNode(node);
                    Console.WriteLine(json2.Trim());
                }
            }
        }