0

Say I have a export (Serialize) function that does the following

public void ExportToXML()
{
    var DCS = new DataContractSerializer(typeof(Entry));
    var XWriter = XmlWriter.Create(@"C:\Temp\Export.xml");
    XWriter.WriteStartDocument();
    XWriter.WriteStartElement("Entries");
    Entries.ForEach(e =>
    {
        DCS.WriteStartObject(XWriter, e);
        DCS.WriteObjectContent(XWriter, e);
        DCS.WriteEndObject(XWriter);
    });
    XWriter.WriteEndElement();
    XWriter.WriteEndDocument();
    XWriter.Close();
}

exports an XML file that looks like

<Entries>
  <Entry>{Some Data}</Entry>
  <Entry>{Some Data}</Entry>
  <Entry>{Some Data}</Entry>
  <Entry>{Some Data}</Entry>
</Entries>

For the Import method I want to deserialize each

<Entry>{Some Data}</Entry>
one at a time so that I can apply a transform
Func<Entry,Entry>

if a

Func<Entry,bool>

predicate is true

This is what I came up with

public void ImportFromXML(string FileName, Func<Entry,Entry> Transform, Func<Entry,bool> DoTransform)
{
    var DCS = new DataContractSerializer(typeof(Entry));
    var ImportedEntries = new List<Entry>();
    foreach (var EntryElement in XDocument.Load(FileName).Root.Elements().Where(xe => xe.Name.LocalName == "Entry")) 
    {
        var XMLEntry = (Entry)DCS.ReadObject(EntryElement.CreateReader());
        ImportedEntries.Add(DoTransform(XMLEntry) ? Transform(XMLEntry) : XMLEntry);
    }
    entries = ImportedEntries.ToDictionary(e => e.KeyName + "\\" + e.ValueName);
}

Which works but I'm wondering if there is a way to do this in one shot with a single XmlReader as opposed to generating each XElement's XMLReader.

I tried to reverse the logic of the Export method

public void ImportFromXML(string FileName, Func<Entry,Entry> Transform, Func<Entry,bool> DoTransform)
{        
    var DCS = new DataContractSerializer(typeof(Entry));
    var ImportedEntries = new List<Entry>();
    var XReader = XmlReader.Create(@"C:\Temp\Export.xml");
    XReader.ReadStartElement("Entries");
    while (!{WHAT Exit Condition?})
    {
        var XMLEntry = (Entry)DCS.ReadObject(XReader());
        ImportedEntries.Add(DoTransform(XMLEntry) ? Transform(XMLEntry) : XMLEntry);
    }
    XReader.Close();
    entries = ImportedEntries.ToDictionary(e => e.KeyName + "\\" + e.ValueName);
}

However I'm not sure what to put in for

{WHAT Exit Condition?}
obviously I can't use

!XReader.EOF
as reading to the end of file will cause it to try and deserialize the closing </Entries> tag as an Entry.

The class these methods are part of will be consumed as part of our SCCM OS deployment task sequences, which means they could be used by multiple concurrently running task sequences that are querying the source XML files over the network. So I'm a little concerned with better performance.

Am I chasing my tail trying to do this with a single XmlReader or is using the combination of LINQ to XML with separate XmlReaders the best option?

halfer
  • 19,824
  • 17
  • 99
  • 186
TofuBug
  • 573
  • 1
  • 6
  • 22
  • Handy tip: if you wish to refer to HTML/XML elements inside paragraph text on Stack Overflow, just wrap them in backticks, ``. It saves a lot of faff with `<` and `>` entities! – halfer Oct 08 '17 at 21:12

2 Answers2

1

The XML file can be written with indentation and without indentation. In the former case the following code works fine:

var settings = new XmlWriterSettings { Indent = true };
var DCS = new DataContractSerializer(typeof(Entry));

using (var writer = XmlWriter.Create(fileName, settings)) // with indentation
{
    writer.WriteStartDocument();
    writer.WriteStartElement("Entries");

    foreach (var entry in Entries)
    {
        DCS.WriteObject(writer, entry);
    }
}

using (var reader = XmlReader.Create(fileName))
{
    while (reader.ReadToFollowing("Entry"))
    {
        var xmlEntry = (Entry)DCS.ReadObject(reader);
        // ...
    }
}

In this case, the ReadToFollowing method firstly reads whitespaces, and then advances to the next Entry node. But in the absence of indents, the method skips one Entry node.

In the latter case, we can use the following code:

using (var writer = XmlWriter.Create(fileName)) // without indentation
// ...


using (var reader = XmlReader.Create(fileName))
{
    while (reader.LocalName == "Entry" || reader.ReadToFollowing("Entry"))
    {
        var xmlEntry = (Entry)DCS.ReadObject(reader);
        // ...
    }
}

Moreover, this code works correctly in both cases.

Alexander Petrov
  • 13,457
  • 2
  • 20
  • 49
  • I just tried this and the problem is it skips every other ! it seems that ReadToFollowing() is not written to check if it is already AT what it needs to read to. So the first ReadToFollwing("Entry") reads to the first child of then DCS.ReadObject reads that entry and moves the stream position to the start of the next , then when ReadToFollowing("Entry") is called it skips over the next till it sees the after that and so on and so forth. – TofuBug Apr 25 '16 at 00:20
  • That check for the LocalName along with the ReadToFollowing() did the trick!! Awesome thanks a bunch! – TofuBug Apr 25 '16 at 01:29
0

If your transforms are sufficiently regular, making an XSLT transform and applying that to the source document would be the approach I'd take. You can make xml or non-xml output. It's kinda weird if you've never used it...but you write xsl:template elements that select input and then nest xsl:apply-template elements that describe how to transform the selected pieces. Sounds custom-made for your problem.

A decent example, in a related question can be found here, and a good 50,000 ft overview of using XSLT here.

Community
  • 1
  • 1
Clay
  • 4,999
  • 1
  • 28
  • 45
  • I had considered XSLT transforms (I primarily use them for XML data and structure integrity validation of XML files other users modify) However i might not be the only one writing code against this class, and our collective strength in XSLT is admittedly lacking. Additionally in my experience with anything more than simple XML structures the XSLT gets massively unwieldy quickly. Finally we use LINQ a lot so the other programmers who would be writing code against this class are already used to and proficient at using Lambda expressions and i'd rather not throw a curve ball their way. – TofuBug Apr 24 '16 at 22:59
  • Also this class is to read and modify Registry.pol files (Machine and User Local GPO) right now it's for a VERY specific case where I need to identify entries (from the "template" XML) that have a {Computer} placeholder that gets replaced with the computer name when imported into memory prior to being written to the imaged computer. I try when ever I find a need, to write code that not only satisfies the current need but hopefully can be used for other needs not thought of yet. Hence the ability to use Lambda expressions to quickly & easily write conditions and transforms for any situation – TofuBug Apr 25 '16 at 00:42