1

As per title am having issues getting data from an XML file with CDATA elements into an array. Based on my current limited understanding of how to do it, I came up with this basic working method CDATA is odd so my normal methods didn't work. My normal route of finding the nodes wasn't stopping on them, and then there is the whole CDATA issue.

XmlTextReader xmlReader = new XmlTextReader(FilePath);
while (xmlReader.Read())
{
    // Position the reader on the OrderNumber node
    xmlReader.ReadToFollowing("quoteNumber");
    XmlReader inner = xmlReader.ReadSubtree();
    while (inner.Read())
    {
        switch (xmlReader.NodeType)
        {
            case XmlNodeType.CDATA:
                Globals.COData[0] = inner.Value;
                break;
        }
    }

    xmlReader.ReadToFollowing("orderNumber");
    inner = xmlReader.ReadSubtree();
    while (inner.Read())
    {
        switch (xmlReader.NodeType)
        {
            case XmlNodeType.CDATA:
                Globals.COData[1] = inner.Value;
                break;
        }
    }

But I have many many data elements to fetch and assume there is a better way. File looks like:

Image of XML

And the relevant portion:

<quoteNumber>
<![CDATA[ John Test 123]]>
</quoteNumber>
<orderNumber>
<![CDATA[ 1352738]]> 
</orderNumber>

The item contained does have a closing element at file end. The entire XML is too large to post.

the XML format is not in my control.

My end goal is to get the OrderNumber and its value into an array. And the Quote number and its value. I am used to seeing <OrderNumber>123</OrderNumber> so CDATA nodes are new to me.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • Additional comment by questions author: My end goal is to get the "OrderNumber" and its value into an array. And the "Quote number" and its value. I wasn't very clear on my question. I am used to seeing 123 and so this is new to me. – ScopeCreep Mar 22 '21 at 01:42
  • The image of your XML is not useful, other than to indicate your elements are in some root default namespace that is not fully shown. Is there any chance you could edit your question to include the "raw" XML **as text** without any escaping or reformatting? E.g. in the actual XML are the `<![CDATA[...]]>` notes surrounded by insignificant whitespace or is that an artifact of how you formatted the question? – dbc Mar 22 '21 at 13:54

1 Answers1

1

It's not entirely clear where you are going wrong because you don't share your complete XML, but you are not checking the return value from XmlReader.ReadToFollowing(string) from inside your Read() loop. Thus, once you read past the last <orderNumber>, you will get an exception when another <quoteNumber> is not found.

I would suggest restructuring your code as follows:

var ns = ""; // Replace with @"http://intelliquip.com/integrationS..." can't see the full namespace from the XML image.
var list = new List<Tuple<string, string>>(); // List of (quoteNumber, orderNumber) values.
var xmlReader = XmlReader.Create(FilePath);
while (xmlReader.ReadToFollowing("quoteNumber", ns))
{
    string quoteNumber = null;
    string orderNumber = null;
    using (var inner = xmlReader.ReadSubtree())
    {
        // We need to skip the insignificant whitespace around the CDATA nodes which ReadElementContentAsString() will not do.
        while (inner.Read())
        {
            switch (xmlReader.NodeType)
            {
                case XmlNodeType.Text:
                case XmlNodeType.CDATA:
                    quoteNumber += inner.Value;
                    break;
            }
        }
        // After ReadSubtree() the reader is positioned on the </quoteNumber> element end.
    }
    // If the next orderNumber node is nmissing, ReadToFollowing() will read all the way past the next quoteNumber node.  
    // Use ReadToNextSibling() instead.
    if (xmlReader.ReadToNextSibling("orderNumber", ns))
    {
        using (var inner = xmlReader.ReadSubtree())
        {
            while (inner.Read())
            {
                switch (xmlReader.NodeType)
                {
                    case XmlNodeType.Text:
                    case XmlNodeType.CDATA:
                        orderNumber += inner.Value;
                        break;
                }
            }
        }
    }

    if (quoteNumber != null && orderNumber != null)
        list.Add(Tuple.Create(quoteNumber, orderNumber)); 
    else
    {
        // Add error handling here
    }
}

Notes:

  • CDATA is just an alternate way of encoding an XML character data node, see What does <![CDATA[]]> in XML mean? for details. XmlReader.Value will contain the unescaped value of an XML character data node regardless of whether it is encoded as a regular text node or a CDATA node.

  • It is unclear from your question whether there must be exactly one <quoteNumber> node in the XML file. Because of that I read the quote and order number pairs into a List<Tuple<string, string>>. After reading is complete you can check how many were read and add then to Globals.COData as appropriate.

  • XmlReader.ReadToFollowing() returns

    true if a matching element is found; otherwise false and the XmlReader is in an end of file state.

    Thus its return value needs to be check to make sure you don't try to read past the end of the file.

  • Your code doesn't attempt to handle situations where an <orderNumber> is missing. If it is, the code will may skip all the way past the next <quoteNumber> to read its order number. To avoid this possibility I use XmlReader.ReadToNextSibling() to limit the scope of the search to <orderNumber> nodes belonging to the same parent node.

  • By using XmlReader.ReadToFollowing("orderNumber") you hardcode your code to assume that the orderNumber node(s) have no namespace prefix. Rather than doing that, it would be safer to explicitly indicate the namespace they are in which seems to be something like http://intelliquip.com/integrationS... where the ... portion is not shown.

    I recommend using XmlReader.ReadToFollowing("orderNumber", ns) where ns is the namespace the order and quote nodes are actually in.

  • XmlTextReader has been deprecated since .Net 2.0. Use XmlReader.Create() instead.

  • The XmlReader API is rather fussy to use. If your XML files are not large you might consider loading them into an XDocument and using LINQ to XML to query it.

    For instance, your XmlReader code could be rewritten as follows:

     var doc = XDocument.Load(FilePath);
     XNamespace ns = ""; // Replace with @"http://intelliquip.com/integrationS..." can't see the full namespace from the XML image.
     var query = from quote in doc.Descendants(ns + "quoteNumber")
         let order = quote.ElementsAfterSelf(ns + "orderNumber").FirstOrDefault()
         where order != null
         select Tuple.Create(quote.Value, order.Value);
    
     var list = query.ToList();
    

    Which looks much simpler.

  • You might also consider replacing the Tuple<string, string> with a proper data model such as

    public class Order
    {
        public string QuoteNumber { get; set; }
        public string OrderNumber { get; set; }
    }
    

Demo fiddle #1 here for XmlReader and #2 here for LINQ to XML.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • Thanks so much I will dig through the advice and see what i can come up with. My code is horrible and would never be released to production - was looking for direction. I will have around 20 elements to get out, and then have to iterate through line items as well. – ScopeCreep Mar 22 '21 at 16:35
  • `ReadSubtree()` can be really helpful in streaming through complex XML, it guarantees you cannot read too little or to much. For instance if you want to search for elements of a certain name then then stream through their children looking for something, you can use `XmlReaderExtensions.ReadAllSubtrees(this XmlReader reader, string localName, string namespaceURI)` from [this answer](https://stackoverflow.com/a/38425483/3744182) to enumerate through them. – dbc Mar 22 '21 at 16:43