-2

What's the efficient way to iterate through this XML document using C#? This document is the result of relationship between two apps. Hence there is the info for the user, and the info for the request itself.

Thank you in advance.

<?xml version="1.0" encoding="utf-16"?>
<Records count="2">
  <Metadata>
    <FieldDefinitions>
      <FieldDefinition id="1001" name="Request ID" alias="Q_ID" />
      <FieldDefinition id="1002" name="User" alias="Target" />
      <FieldDefinition id="1003" name="Which item(s) is the access requested for?" alias="Which_items_is_requested" />
      <FieldDefinition id="1004" name="Select specific(s) for item1" alias="item1" />
      <FieldDefinition id="1005" name="Select specific(s) for item2" alias="item2" />
      <FieldDefinition id="1006" name="Select specific(s) for item3" alias="item3" />
      <FieldDefinition id="101" name="First Name" alias="First_Name" />
      <FieldDefinition id="102" name="Last Name" alias="Last_Name" />
      <FieldDefinition id="100" name="Email" alias="Email" />
    </FieldDefinitions>
  </Metadata>
  <LevelCounts>
    <LevelCount id="989" count="2" />
    <LevelCount id="85" count="2" />
  </LevelCounts>
  <Record contentId="1092725" levelId="989" moduleId="564">
    <Record contentId="736205" levelId="85" moduleId="84">
      <Field id="100" type="1">john.smith@abc.com</Field>
      <Field id="101" type="1">John</Field>
      <Field id="102" type="1">Smith</Field>
    </Record>
    <Field id="1003" type="4">
      <ListValues>
        <ListValue id="11" displayName="Issues Management">item1</ListValue>
        <ListValue id="13" displayName="Master Control Procedures">item3</ListValue>
      </ListValues>
    </Field>      
    <Field id="1001" type="6">123123</Field>
    <Field id="1002" type="9">
      <Reference id="736205">John Smith</Reference>
    </Field>
    <Field id="1005" type="9">
      <Reference id="3">item11</Reference>
      <Reference id="3">item12</Reference>
    </Field>
    <Field id="1006" type="9" />
    <Field id="1004" type="9">
      <Reference id="7">item31</Reference>
      <Reference id="8">item32</Reference>
    </Field>
  </Record>
  <Record contentId="1092759" levelId="989" moduleId="564">
    <Record contentId="775678" levelId="85" moduleId="84">
      <Field id="100" type="1">Peter.Smith@abc.com</Field>
      <Field id="101" type="1">Peter</Field>
      <Field id="102" type="1">Smith</Field>
    </Record>
    <Field id="1003" type="4">
      <ListValues>
        <ListValue id="11" displayName="Issues Management">item1</ListValue>
        <ListValue id="12" displayName="Master Control Procedures">item2</ListValue>
        <ListValue id="13" displayName="Control Procedure">item3</ListValue>
      </ListValues>
    </Field>      
    <Field id="1001" type="6">123124</Field>
    <Field id="1002" type="9">
      <Reference id="775678">Peter Smith</Reference>
    </Field>
    <Field id="1005" type="9">
      <Reference id="3">item11</Reference>
      <Reference id="4">item12</Reference>
    </Field>
    <Field id="1006" type="9">
      <Reference id="5">item21</Reference>
      <Reference id="6">item22</Reference>
    </Field>
    <Field id="1004" type="9">
      <Reference id="7">item31</Reference>
      <Reference id="8">item32</Reference>
    </Field>
  </Record>
</Records>

What's the efficient way to iterate through this XML document using C#? This document is the result of relationship between two apps. Hence there is the info for the user, and the info for the request itself.

Thank you in advance.

canton7
  • 37,633
  • 3
  • 64
  • 77
  • Generate a C# class (there are online generators) and then deserialize it. – McNets Apr 07 '22 at 15:51
  • What do you mean by 'iterate' in this context? There are many, many questions out there showing how to load this into `XDocument` and query it. I'd suggest you research that and try to write the code you need and ask a question if you get stuck. – Charles Mager Apr 07 '22 at 15:53
  • 1
    It's XML, not JSON, though. @McNets – Fildor Apr 07 '22 at 15:53
  • Well yes sorry, but you can do it too. https://json2csharp.com/xml-to-csharp – McNets Apr 07 '22 at 15:59
  • @McNets Yep, was just referring to the "use Newtonsoft.Json or Text.Json" part. Having C# Classes constructed for this automatically, will probably be much more comfortable than doing it manually :D – Fildor Apr 07 '22 at 16:01
  • If speed is important do not use Serialization. I usually use xml linq. If you are putting into a table or database I recommend flatten results – jdweng Apr 07 '22 at 16:06
  • As @jdweng already pointed out, LINQ to XML API is a way to go. – Yitzhak Khabinsky Apr 07 '22 at 16:09
  • 3
    Note that deserializing into a class with XmlSerializer is probably *quicker* than using XDocument/XmlDocument. The deserialization is done using runtime code generation, and since the deserializer knows exactly what it's expecting, it can go at quite a pace. If you need to work at max speed and can't hold the whole document in memory, use XmlReader, although it's painful. If you want to deserialize large chunks of data, use XmlSerializer with C# classes. If you just want to pick a couple of things out of a large document and don't want to write lots of classes, use XDocument/XmlDocument – canton7 Apr 07 '22 at 16:12
  • 1
    For your case -- quite simple data -- I'd definitely use XmlSerializer: it'll be significantly nicer to work with than XDocument/XmlDocument – canton7 Apr 07 '22 at 16:14
  • @canton7 Maybe I wasn't so clear in my question, I'm able to load this document and loop through the nodes to read the values, and then save it to the database. What I could not do right is reading the different structure since these is 2 records types, one inside the other. – user616912 Apr 07 '22 at 16:27
  • @user616912 What's the problem? A Record can have 0 or 1 embedded Records and 0 or more Fields. A Field can have 0 or 1 ListValues, and 0 or more References. Correct? – canton7 Apr 07 '22 at 16:30
  • 1
    While asking a question, you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – Yitzhak Khabinsky Apr 07 '22 at 17:40
  • 1
    *I'm able to load this document and loop through the nodes to read the values, and then save it to the database. What I could not do right is reading the different structure since these is 2 records types, one inside the other.* -- then please [edit] your question to share a [mcve] showing where you are stuck and need help. As it is it sounds like you are asking an opinion-based question about which XML parsing approach in c# is most "efficient". See [ask]. – dbc Apr 07 '22 at 17:58
  • Absent more detail about where you are stuck your question is likely to be closed as "Unclear" or as a duplicate of [How does one parse XML files?](https://stackoverflow.com/q/55828/3744182). – dbc Apr 07 '22 at 18:04

2 Answers2

1

From the comments, your question is how you deserialize the nested records.

A class structure like this seems to work:

public class Records
{
    [XmlElement("Record")]
    public List<Record> Items { get; } = new();
}

public class Record
{
    [XmlAttribute("contentId")]
    public int ContentId { get; set; }
    
    [XmlElement("Record")]
    public Record ChildRecord { get; set; }
    
    [XmlElement("Field")]
    public List<Field> Fields { get; } = new();
}

public class Field
{
    [XmlAttribute("id")]
    public int Id { get; set; }
    
    [XmlElement("Reference")]
    public List<Reference> References { get; } = new();
    
    [XmlArray("ListValues")]
    public List<ListValue> ListValues { get; } = new();
    
    [XmlText]
    public string Content { get; set; }
}

public class Reference
{
    [XmlAttribute("id")]
    public int Id { get; set; }
}

public class ListValue
{
    [XmlAttribute("id")]
    public int Id { get; set; }
    
    [XmlText]
    public string Content { get; set; }
}

See this doc on how to use the various attributes.

I haven't bothered to deserialize all of the attributes, or the Metadata or LevelCounts sections -- those are an exercise for the reader!

Use an XmlSerializer, for example:

using var reader = new StringReader(input);
var records = (Records)new XmlSerializer(typeof(Records)).Deserialize(reader);

See it on dotnetfiddle.net.

canton7
  • 37,633
  • 3
  • 64
  • 77
0

To flatten data into one table is messy but can be done. See below

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Xml;
using System.Xml.Linq;
using System.Data;

namespace ConsoleApplication21
{
    class Program
    {
        const string FILENAME = @"c:\temp\test.xml";

        static void Main(string[] args)
        {
            StreamReader sReader = new StreamReader(FILENAME);
            //skip 1st line containing UTF-16
            sReader.ReadLine();
            XDocument doc = XDocument.Load(sReader);

            DataTable dt = new DataTable();
            dt.Columns.Add("Field Id", typeof(int));
            dt.Columns.Add("Field Name", typeof(string));
            dt.Columns.Add("Alias", typeof(string));
            dt.Columns.Add("Main Content ID", typeof(int));
            dt.Columns.Add("Main Level ID", typeof(int));
            dt.Columns.Add("Main Module ID", typeof(int));
            dt.Columns.Add("Sub Content ID", typeof(int));
            dt.Columns.Add("Sub Level ID", typeof(int));
            dt.Columns.Add("Sub Module ID", typeof(int));
            dt.Columns.Add("Type", typeof(int));
            dt.Columns.Add("Field Value Name", typeof(string));
            dt.Columns.Add("Field Text", typeof(string));
            dt.Columns.Add("List Value Id", typeof(int));
            dt.Columns.Add("Display Name", typeof(string));
            dt.Columns.Add("List Value Text", typeof(string));
            dt.Columns.Add("Reference Id", typeof(int));
            dt.Columns.Add("Reference Text", typeof(string));

            foreach (XElement fieldDefinition in doc.Descendants("FieldDefinition"))
            {
                DataRow newRow = dt.Rows.Add();
                newRow["Field Id"] = (int)fieldDefinition.Attribute("id");
                newRow["Field Name"] = (string)fieldDefinition.Attribute("name");
                newRow["Alias"] = (string)fieldDefinition.Attribute("alias");
            }

            foreach (XElement record in doc.Root.Elements("Record"))
            {
                int mainContentId = (int)record.Attribute("contentId");
                int mainLevelId = (int)record.Attribute("levelId");
                int mainModuleId = (int)record.Attribute("moduleId");

                foreach(XElement subRecord in record.Elements("Record"))
                {
                    int subContentId = (int)record.Attribute("contentId");
                    int subLevelId = (int)record.Attribute("levelId");
                    int subModuleId = (int)record.Attribute("moduleId");

                    foreach (XElement field in subRecord.Elements("Field"))
                    {
                        int fieldId = (int)field.Attribute("id");
                        int fieldType = (int)field.Attribute("type");
                        string fieldName = (string)field;

                        DataRow fieldRow = dt.AsEnumerable().Where(x => x.Field<int>("Field Id") == fieldId).First();
                        fieldRow["Main Content ID"] = mainContentId;
                        fieldRow["Main Level ID"] = mainLevelId;
                        fieldRow["Main Module ID"] = mainModuleId;
                        fieldRow["Sub Content ID"] = subContentId;
                        fieldRow["Sub Level ID"] = subLevelId;
                        fieldRow["Sub Module ID"] = subModuleId;
                        fieldRow["Type"] = fieldType;
                        fieldRow["Field Value Name"] = fieldName;

                    }
                    foreach (XElement field in record.Elements("Field"))
                    {
                        int fieldId = (int)field.Attribute("id");
                        int fieldType = (int)field.Attribute("type");
                        string fieldText = (string)field;
                        int count = 0;
                        DataRow fieldRow = dt.AsEnumerable().Where(x => x.Field<int>("Field Id") == fieldId).First();
                        List<XElement> listValues = field.Descendants("ListValue").ToList();
                        List<XElement> references = field.Elements("Reference").ToList();
                        if (listValues.Count > 0)
                        {
                            foreach (XElement listValue in listValues)
                            {
                                count++;
                                int listValueId = (int)listValue.Attribute("id");
                                string displayName = (string)listValue.Attribute("displayName");
                                string listValueText = (string)listValue;

                                if (count > 1)
                                {
                                    string fieldName = fieldRow.Field<string>("Field Name");
                                    string alias = fieldRow.Field<string>("Alias");
                                    fieldRow = dt.Rows.Add();
                                    fieldRow["Field Id"] = fieldId;
                                    fieldRow["Field Name"] = fieldName;
                                    fieldRow["Alias"] = alias;
                                }
                                fieldRow["Main Content ID"] = mainContentId;
                                fieldRow["Main Level ID"] = mainLevelId;
                                fieldRow["Main Module ID"] = mainModuleId;
                                fieldRow["Type"] = fieldType;
                                if(fieldText.Length > 0) fieldRow["Field Text"] = fieldText;
                                fieldRow["List Value Id"] = listValueId;
                                fieldRow["Display Name"] = displayName;
                                fieldRow["List Value Text"] = listValueText;
                            }
                        }
                        count = 0;
                        if (references.Count > 0)
                        {
                            foreach (XElement reference in references)
                            {
                                count++;
                                int referenceId = (int)reference.Attribute("id");
                                string referenceText = (string)reference;
                                if (count > 1)
                                {
                                    string fieldName = fieldRow.Field<string>("Field Name");
                                    string alias = fieldRow.Field<string>("Alias");
                                    fieldRow = dt.Rows.Add();
                                    fieldRow["Field Id"] = fieldId;
                                    fieldRow["Field Name"] = fieldName;
                                    fieldRow["Alias"] = alias;
                                }
                                fieldRow["Main Content ID"] = mainContentId;
                                fieldRow["Main Level ID"] = mainLevelId;
                                fieldRow["Main Module ID"] = mainModuleId;
                                fieldRow["Type"] = fieldType;
                                if (fieldText.Length > 0) fieldRow["Field Text"] = fieldText;
                                fieldRow["Reference Id"] = referenceId;
                                fieldRow["Reference Text"] = referenceText;
                            }
                        }
                        if((listValues.Count == 0) && (references.Count == 0))
                        {
                            fieldRow["Type"] = fieldType;
                            if (fieldText.Length > 0) fieldRow["Field Text"] = fieldText;
                        }
                    }
                }
            }
            dt = dt.AsEnumerable().OrderBy(x => x.Field<int>("Field Id")).CopyToDataTable();

        }
 
    }
 
}
jdweng
  • 33,250
  • 2
  • 15
  • 20