1

I have an XML file in which I would like to retrieve all unique paths from. In the following example:

<?xml version="1.0" encoding="utf-8"?>
<views>
    <invoice>
        <newRa elem="0">
            <createD>20150514</createD>
            <modD>1234</modD>
            <sample>text</sample>
        </newRa>
        <total>1.99</total>
    </invoice>
</views>

I want to retrieve:

views/invoice/newRa/createD
views/invoice/newRa/modD
views/invoice/newRa/sample

and so on......

I have some experience with xPath, but I'm not sure how to begin in VB setting up a sub that will do this for me. Mind you I'm working with .NET 2.0 so LINQ is not possible.

EDIT 1:

Dim xOne As New XmlDocument
xOne.Load("d/input/oneTest.xml")

For Each rNode As XmlNode In xOne.SelectSingleNode("/")
    If rNode.HasChildNodes Then
        subHasChild(rNode)
    End If
Next



Private Sub subHasChild(ByVal cNode As XmlNode)
    Dim sNode = cNode.Name

    If cNode.HasChildNodes Then
        sNode = sNode + "/" + cNode.FirstChild.Name
        cNode = cNode.FirstChild
        subHasChild(cNode)
    End If

    Dim sw As New StreamWriter("d:\input\paths.txt")
    sw.WriteLine(sNode)
    sw.Flush() : sw.Close() : sw.Dispose()
End Sub
Gmac
  • 169
  • 2
  • 14
  • I'll make this a comment as it would take longer than I want to spend right now to actually write the code. What you need is a recursive subroutine. So you set a variable to the root as a node, and pass that to a subroutine that loops through all of 's children. Inside that loop, you set a variable to the child node, and if it has children, pass it right back into the subroutine. If it doesn't have children, then you can get it's path a store it in an array or write it to a file--whatever you're doing with it. – Tony Hinkle May 20 '15 at 00:56
  • 1
    [In VB.](https://msdn.microsoft.com/en-us/library/bb387045.aspx?cs-save-lang=1&cs-lang=vb#code-snippet-1) | [In C#.](https://msdn.microsoft.com/en-us/library/bb387045.aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1) | [In XSLT](http://stackoverflow.com/a/9065408/290085). | [In Java/SAX.](http://stackoverflow.com/a/4783172/290085) – kjhughes May 20 '15 at 01:23
  • @Tony Hinkle, Just updated my block of code, am I on the right path? – Gmac May 20 '15 at 17:26

3 Answers3

2

Try this:

    Dim xd = <?xml version="1.0" encoding="utf-8"?>
<views>
    <invoice>
        <newRa elem="0">
            <createD>20150514</createD>
            <modD>1234</modD>
            <sample>text</sample>
        </newRa>
        <total>1.99</total>
    </invoice>
</views>

    Dim getPaths As Func(Of XElement, IEnumerable(Of String)) = Nothing
    getPaths = Function(xe) _
        If(xe.Elements().Any(), _
            xe.Elements() _
                .SelectMany( _
                    Function(x) getPaths(x), _
                    Function(x, p) xe.Name.ToString() + "/" + p) _
                .Distinct(), _
            { xe.Name.ToString() })

    Dim paths = getPaths(xd.Root)

It gives me:

views/invoice/newRa/createD 
views/invoice/newRa/modD 
views/invoice/newRa/sample 
views/invoice/total 

It correctly gets rid of duplicate paths.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
1

Thank you to EVERYONE who chimed in with responses. After researching all sorts of ways to do this, I ended up using a dictionary to get all unique paths. For anyone who may come across a similar scenario, here is what I used:

Dim xdDoc As New SmlDocument
Dim sw As New StreamWriter("Output File Path")
Dim diElements As New Dictionary(Of String, Integer)

xdDoc.Load("File Path")

For Each rootNode As XmlNode In xdDoc.SelectNodes("//*")
            Dim sNode As String = rootNode.Name

            While Not rootNode.ParentNode Is Nothing _
            AndAlso Not rootNode.ParentNode.Name Is "invoice" _
            AndAlso Not rootNode.ParentNode.Name Is "#document"
                rootNode = rootNode.ParentNode
                sNode = rootNode.Name + "/" + sNode
            End While

            If Not diElements.ContainsKey(sNode) Then
                diElements.Add(sNode, 1)
            Else
                diElements(sNode) += 1
            End If
        Next
    End While

    Dim pair As KeyValuePair(Of String, Integer)
    For Each pair In diElements
        sw.WriteLine("{0} --- {1}", pair.Value, pair.Key)
    Next

    sw.Flush() : sw.Close() : sw.Dispose()
Gmac
  • 169
  • 2
  • 14
0

This was a lot uglier than I thought. I'm not really a good programmer, but I can usually figure out how to get it done, but my code is typically for very limited use for small utilities, so it just needs to work.

Note: Now updated to output only unique paths

Private PathArray As New ArrayList

Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load

    Dim xDoc As New XmlDocument
    Dim Output As String = ""

    xDoc.Load("C:\inetpub\wwwroot\SqlMonitor\MonitorConfig.xml")
    NodeRecurser(xDoc.SelectSingleNode("/"))

    For Each item In PathArray
        Output += item & vbCrLf
    Next

    MsgBox(Output)

    Me.Close()

End Sub

Sub NodeRecurser(xNode As XmlNode)

    If xNode.HasChildNodes Then

        For Each cNode As XmlNode In xNode.ChildNodes

            NodeRecurser(cNode)

        Next

    Else : GetPath(xNode)

    End If

End Sub

Sub GetPath(n As XmlNode)

    Dim xPath As String = ""

    Do

        If n.ParentNode.Name <> "#document" Then

            xPath = n.ParentNode.Name & "/" & xPath
            n = n.ParentNode

        Else : Exit Do

        End If

    Loop

    If xPath.Length > 1 And Not PathArray.Contains(xPath) Then PathArray.Add(xPath)

End Sub
Tony Hinkle
  • 4,706
  • 7
  • 23
  • 35
  • @Gmac if this works for you please mark it as the answer, or let me know what's happening with it. I am using the latest .NET, so I'm not sure if this will work for you or not. – Tony Hinkle May 21 '15 at 13:41