0

I have a set of large XML files with various sections in them.

As an example file I have mocked this up which serializes and deserializes just fine if I don't have the appended counter of 1, 2, 3... etc.

<?xml version="1.0" encoding="utf-16"?>
<employees xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <category name="TestCategory" baseicon="0" decalicon="0">
        <EmployeeList1>
            <name type="string">Peter Parker</name>
            <age type="number">25</age>
        </EmployeeList1>
        <EmployeeList2>
            <name type="string">J.J. Jameson</name>
            <age type="number">57</age>
        </EmployeeList2>
    </category>
</employees>

However, when I have the incremented elements in the collection I get back no objects in the EmployeeList collection but will get back the overall Employees object.

So all the objects within the EmployeeList elements have the same structure overall although some will have optional fields like, let's say "homeowner".

[XmlRoot(ElementName = "category")]
public class Category
{
    [XmlAttribute(AttributeName = "name")]
    public string Name;

    [XmlAttribute(AttributeName = "baseicon")]
    public int Baseicon;

    [XmlAttribute(AttributeName = "decalicon")]
    public int Decalicon;

    [XmlElement]
    public List<Employee> EmployeeList { get; set; }

    public Category()
    {

    }
}

And the Employee class looks like this:

[XmlRoot(ElementName = "employees")]
public class Employees
{
    [XmlElement(ElementName = "category")]
    public Category Category;

    public Employees()
    {

    }
}

[XmlRoot(ElementName = "employee")]
public class Employee
{
    [XmlElement(ElementName = "name")]
    public Name Name { get; set; }

    [XmlElement(ElementName = "age")]
    public Age Age { get; set; }
}

Then, of course the follow up is how to serialize a collection of elements with an incremented ID?

dbc
  • 104,963
  • 20
  • 228
  • 340
Slagmoth
  • 173
  • 1
  • 10
  • Honestly this is a pain to deal with using `XmlSerializer`. And there's no good way to make an XSD schema for such XML either. You could start with [How to serialize an array to XML with dynamic tag names](https://stackoverflow.com/q/50415653/3744182) and [Deserialize XML with XmlSerializer where XmlElement names differ but have same content](https://stackoverflow.com/q/45766597/3744182) or [How do you deserialize XML with dynamic element names?](https://stackoverflow.com/q/37255149/3744182). – dbc Apr 09 '21 at 00:06
  • @dbc Thanks, I was getting to the point that I thought that I would have to do more string manipulation than I really wanted to do with this and maybe do them individually and crunch them all together. I will check those out. – Slagmoth Apr 09 '21 at 00:08
  • Or if you think of your employee list as a `Dictionary` you basically need to be able to XML-serialize a dictionary. You could start with [How to XML-serialize a dictionary](https://stackoverflow.com/a/3671371), – dbc Apr 09 '21 at 00:11
  • @dbc The data presented here is a mock example. My actual data is much more complex. I used this to test that I was doing the deserialization correctly and could get data back from the files if all the element names were the same. I had looked into XDocument as well and didn't go too far with it because I wasn't doing the building of the entire document in memory. I have a bunch of XML I am trying to programmatically consolidate pieces then I will try and parse other pieces to add additional elements later. – Slagmoth Apr 09 '21 at 00:15
  • It is easy with xml linq. You get the elements under category. The tag names become x.Name.LocalName – jdweng Apr 09 '21 at 09:29

2 Answers2

2

Using serial numbers as part of the element name is really bad XML design, and the best way of dealing with badly designed XML is often to put it through an XSLT transformation that turns it into something more manageable, before doing any further processing. In this case you might simply strip off the numeric suffixes, because they're completely redundant.

In XSLT 3.0 that would be:

<xsl:transform version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:mode on-no-match="shallow-copy">
  <xsl:template match="*[starts-with(local-name(), 'EmployeeList'>
    <EmployeeList>
      <xsl:apply-templates/>
    </EmployeeList>
  </xsl:template>
</xsl:transform>
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks, that is certainly one option but the application that accesses this data is likely using some sort of XPath look up on an ID to find the specific item in the array. The actual array tags are like "id-00004" or something similar. Not sure what they didn't just look for the unique name inside the xml but probably had to do with the additional supplements that could provide other overrides to a named tag or something to that effect. Of course I would then have to reserialize them back into those IDs which iirc would just be another XSLT. – Slagmoth Apr 09 '21 at 13:57
  • 1
    One of the nice things about answering questions on StackOverflow is that you can tell people what is the right thing to do, without worrying about all the messy complications of existing applications that are locked into doing it wrong. – Michael Kay Apr 11 '21 at 08:07
  • Ain't that the truth. – Slagmoth Apr 12 '21 at 01:35
2

I can offer a custom XmlReader that will replace the specified element names on the fly.

public class ReplacingXmlReader : XmlTextReader
{
    private readonly string _nameToReplace;

    public ReplacingXmlReader(string url, string nameToReplace)
        : base(url)
    {
        _nameToReplace = nameToReplace;
    }

    // Define the remaining constructors here.

    public override string LocalName
    {
        get
        {
            if (base.LocalName.StartsWith(_nameToReplace))
                return _nameToReplace;
            return base.LocalName;
        }
    }
}

Usage is easy:

var xs = new XmlSerializer(typeof(Employees));

using var reader = new ReplacingXmlReader("test.xml", "EmployeeList");

var employees = (Employees)xs.Deserialize(reader);

If necessary, you can make a list of element names to replace.

Alexander Petrov
  • 13,457
  • 2
  • 20
  • 49
  • So this led me to [this](https://stackoverflow.com/questions/14545349/change-the-name-of-an-xelement-in-linq-to-xml) which has me working to an extent. Now however I have an issue with a collection within one of the elements that has an html table. Is there a way to just take the literal information within a tag? I don't need the table to be broken into a ton of tr classes to be honest. @MichaelKay thanks – Slagmoth Apr 21 '21 at 23:07