0

I need a function to return preorder and postorder of all elements in C# which returns me a list of (XElement, Preorder , Postorder) for all Element. How can I do this?

for example, with this XML :

<?xml version='1.0' encoding='ISO-8859-1' ?>

<!--
  XML-Page generated by PENELOPE.
  Copyright (c) 1999 Araneus Group and University 'Roma Tre', Rome, ITALY
  All rights reserved.
-->
<SigmodRecord>
    <issue>
        <volume>11</volume>
        <number>1</number>
        <articles>
            <article>
                <title>Architecture of Future Data Base Systems</title>
                <authors>
                    <author position="00">Lawrence A. Rowe</author>
                    <author position="01">Michael Stonebraker</author>
                </authors>
                <initPage>30</initPage>
                <endPage>44</endPage>

            </article>
            <article>
                <title>Multisafe - A Data Security Architecture</title>
                <authors>
                    <author position="00">Robert P. Trueblood</author>
                    <author position="01">ii. Rex Hartson</author>
                </authors>
                <initPage>45</initPage>
                <endPage>63</endPage>
            </article>
        </articles>
    </issue>
    <issue>
        <volume>12</volume>
        <number>2</number>
        <articles>
            <article>
                <title>Comparison and Mapping of the Relational and CODASYL Data Models</title>
                <authors>
                    <author position="00">Gary H. Sockut</author>
                </authors>
                <initPage>55</initPage>
                <endPage>68</endPage>
            </article>
        </articles>
    </issue>
</SigmodRecord>

I need this answer :

Element = <SigmodRecord>,preorder = 1, postorder = 29
Element = <SigmodRecord/issue>,preorder = 2, postorder = 18
.
.
.

I have written this class but it works slow in large XML files because for each element it proceed all of nodes!

class CTraversal
{
    private int preorderID = 0;
    private int postorderID = 0;

    public int PreOrderTraversal(XElement RootNode, XElement CurrentNode)
    {
        if (CurrentNode == RootNode)
        {
            return ++preorderID;
        }
        if (RootNode.DescendantNodes().Count() > 1)
        {
            foreach (var child in RootNode.Elements())
            {
                if (PreOrderTraversal(child, CurrentNode) != -1) return preorderID;
            }
        }
        return -1;
    }

    public int PostOrderTraversal(XElement RootNode, XElement CurrentNode)
    {
        if (RootNode.DescendantNodes().Count() > 1)
        {
            foreach (var child in RootNode.Elements())
            {
                if (PostOrderTraversal(child, CurrentNode) != -1) return postorderID;
            }
        }
        if (CurrentNode == RootNode)
        {
            return ++postorderID;
        }
        ++postorderID;
        return -1;
    }
}
Bsflasher
  • 138
  • 2
  • 17
  • Generating results (even with proper memoization of partial results) would be slow for large XML... Are you sure you need complete list? – Alexei Levenkov May 21 '15 at 17:09
  • yes, I need complete list because it is a part of a big algorithm. Thanks – Bsflasher May 21 '15 at 17:11
  • 1
    Why check for Descendants Count? You may be doing twice the number of loops since it is just an IEnumerable. Try just checking if`FirstOrDefault()` is null. Also, why check DescendantNodes, but not actually loop through them? – TyCobb May 21 '15 at 17:14
  • Well, do you need the complete list or not? With the comment, this is kind of "how to get the complete list without getting the complete list". – H H May 21 '15 at 17:16
  • 1
    `XElement.DescendantNodesAndSelf` is already a preorder traversal. Why not just use it? – dbc May 21 '15 at 17:17
  • Sorry TyCobb, I can't understand what you say? Where you mention in code? – Bsflasher May 21 '15 at 17:17
  • @bsflasher I am saying `RootNode.DescendantNodes().Count()` is perhaps doing an entire loop just to get you a count. Try `RootNode.DescendantNodes().Any()` instead. My other question is, why are you even checking `RootNode.DescendantNodes().Count()` when you aren't looping through `RootNode.DescendantNodes()`? Your actual loop is using `RootNode.Elements()`. – TyCobb May 21 '15 at 17:19
  • @dbc , I need the integer order! – Bsflasher May 21 '15 at 17:23
  • @Henk Holterman , I need complete list. List – Bsflasher May 21 '15 at 17:24
  • @TyCobb, Can you post here your suggestion? I need a list of for all nodes – Bsflasher May 21 '15 at 17:25
  • I was merely giving you suggestions for speeding your methods up since you were mentioning it was slow. – TyCobb May 21 '15 at 17:31
  • But what exactly is that Preorder/PostOrder result? You're only retuning `int`. Are you counting the Tree, or getting the Height? – H H May 21 '15 at 17:33
  • @ Henk Holterman, This functions return the preorder/postorder id for one element. I will complete my question with a sample :) – Bsflasher May 21 '15 at 17:34
  • @bsflasher - once you have the enumeration, use [How to get the index of an element in an IEnumerable?](https://stackoverflow.com/questions/1290603/how-to-get-the-index-of-an-element-in-an-ienumerable) to get the integer order. – dbc May 21 '15 at 18:28
  • @ dbc, Thanks , Suervy's post works ... :) – Bsflasher May 21 '15 at 18:39
  • @HenkHolterman Looks like the number is basically the index of the elements in question in a sequence of the appropriate ordering. So if you ordered the elements by that number they would be in the appropriate pre/post ordering. – Servy May 21 '15 at 18:42

1 Answers1

1

So handling a pre-order tree traversal is the easier of the two, so we'll start there.

The method will accept a sequence of items, and a function that accepts an item and translates it into a sequence of items, this lets us get all children for any given node.

Then in our method we'll have a stack of iterators, pushing the starting sequence's iterator into the stack to begin.

From there we loop as long as there are any iterators to process. We get the current item to process, if it has children we yield its current item, push the iterator back onto the stack for later processing, fetch the sequence of children, push them onto the stack for processing.

public static IEnumerable<T> PreOrder<T>(
    IEnumerable<T> source,
    Func<T, IEnumerable<T>> childSelector)
{
    var stack = new Stack<IEnumerator<T>>();
    stack.Push(source.GetEnumerator());
    try
    {
        while (stack.Any())
        {
            var current = stack.Pop();
            if (current.MoveNext())
            {
                stack.Push(current);
                var children = childSelector(current.Current);
                stack.Push(children.GetEnumerator());
                yield return current.Current;
            }
            else
            {
                current.Dispose();
            }
        }
    }
    finally
    {
        foreach (var iterator in stack)
            iterator.Dispose();
    }
}

This covers our pre-order algorithm of, for each item to process, yield it, then process all child items.

Here's a test case to demonstrate it. It's an XML document in which each node has a value representing the order that it should be yielded from a pre order traversal:

string preOrderExample =
@"<node value='1'>
  <node value='2'>
    <node value='3'/>
    <node value='4'/>
    <node value='5'/>
  </node>
  <node value='6'>
    <node value='7'>
      <node value='8'/>
      <node value='9'>
        <node value='10'/>
      </node>
    </node>
    <node value='11'/>
  </node>
</node>";

var preOrderDoc = XDocument.Parse(preOrderExample);
var query = PreOrder(new[] { preOrderDoc.Root }, node => node.Elements());
foreach (var node in query)
    Console.WriteLine(node.Attribute("value").Value);

This will print out 1, 2, 3, ..., 11, indicating they're being processed in the right order.

Post order is very similar, but needs a bit of tweaking. Basically what we want to do is have exactly the same algorithm, but to yield each item after processing all of its children. The first thing to do is copy-paste the same algorithm and just remove the `yield. Now, where, in our algorithm, do we know where all of an item's children have been processed?

Well, just after we pop an iterator off of the stack, the Current item of that iterator has just had all of its items processed. Well, that is, unless we haven't yet called MoveNext on that iterator at all, in which case there is no current item. So, if there is an item, we should yield it.

Sadly, there's no way, given an IEnumerator, to know if it's had MoveNext called on it yet. We could create a new type that wraps an IEnumerator and keeps track of whether or not it's been started. This is probably actually a good idea, but since this is only used in this one method I cheated a bit and just used a Tuple<IEnumerator<T>, bool> to hold onto a sequence that may or may not have been started, replacing all usage of IEnumerator<T> in the method with that Tuple, and setting the boolean based on whether or not we know that we've asked that iterator for a value.

public static IEnumerable<T> PostOrder<T>(
    IEnumerable<T> source,
    Func<T, IEnumerable<T>> childSelector)
{
    var stack = new Stack<Tuple<IEnumerator<T>, bool>>();
    stack.Push(Tuple.Create(source.GetEnumerator(), false));
    try
    {
        while (stack.Any())
        {
            var current = stack.Pop();
            if (current.Item2)
                yield return current.Item1.Current;
            if (current.Item1.MoveNext())
            {
                stack.Push(Tuple.Create(current.Item1, true));
                var children = childSelector(current.Item1.Current);
                stack.Push(Tuple.Create(children.GetEnumerator(), false));
            }
            else
            {
                current.Item1.Dispose();
            }
        }
    }
    finally
    {
        foreach (var iterator in stack)
            iterator.Item1.Dispose();
    }
}

Finally we have a test case that has the nodes' value ordered such that they'll print out 1, 2, 3, ... if they're written in post order:

string postOrderExample =
@"<node value='11'>
  <node value='4'>
    <node value='1'/>
    <node value='2'/>
    <node value='3'/>
  </node>
  <node value='10'>
    <node value='8'>
      <node value='5'/>
      <node value='7'>
        <node value='6'/>
      </node>
    </node>
    <node value='9'/>
  </node>
</node>";

var postOrderDoc = XDocument.Parse(postOrderExample);
query = PostOrder(new[] { postOrderDoc.Root }, node => node.Elements());
foreach (var node in query)
    Console.WriteLine(node.Attribute("value").Value);
Servy
  • 202,030
  • 26
  • 332
  • 449