2

first and foremost, if I make mistakes in nomenclature of the XML file - I am sorry! Let's say I have the following syntax in a XML File:

<book id="bk101">
  <author>Gambardella, Matthew</author>
  <title>XML Developer's Guide</title>
  <genre>Computer</genre>
  <price>44.95</price>
  <publish_date>2000-10-01</publish_date>
  <description>An in-depth look at creating applications 
  with XML.</description>
</book>
<book id="bk102">
  <author>Ralls, Kim</author>
  <title>Midnight Rain</title>
  <genre>Fantasy</genre>
  <price>5.95</price>
  <publish_date>2000-12-16</publish_date>
  <description>A former architect battles corporate zombies, 
  an evil sorceress, and her own childhood to become queen 
  of the world.</description>
</book>

etc..

Some of the books however have extra nodes, such as <author_birth>,<authors_favorite_tvshow> etc..

I would like to take all books in my XML file and transpose them into columns, one book per row. I have tried to get all node values of the books, however with some of the <author_birth> nodes missing, I am unable to use simple For loop, as there is different number of "price" nodes and different number of "<author_birth>".

I'd say it would be best to take all the books and loop through them,then take the values of the respective nodes. However I do not know what the right function for this might be..

Thanks!

heikeke
  • 21
  • 1
  • 2
  • 2
    My suggestion is to loop through all the top level XML nodes (`book` nodes) first and create a dictionary of all possible child nodes - basically creating a super-set of nodes that will become the columns of your table. Then begin your second pass through the list and add the data for each of the book nodes to the applicable columns. You could do this in one pass by adding any newly encountered child nodes as new columns to the right of the existing data. – PeterT May 12 '16 at 21:35
  • Hi Peter, yes this is exactly, what I have in mind. The main problem is, that I do not know how to loop over book nodes, as I do not know the correct syntax of the propriate selectors, how to handle the XML objects using VBA etc.. And I failed to find such information online. – heikeke May 12 '16 at 21:46
  • Take a look at [this answer](http://stackoverflow.com/a/20022152/4717755). It does exactly what you want it to do. – PeterT May 12 '16 at 21:49
  • What MS Office application are you using? If Excel or Access, both have XML import modules. – Parfait May 14 '16 at 03:28
  • Thanks PeterT! Managed to get it working, still a bit confused with the whole XML DOM thing in VBA, but now it's doing what I need it to do! ad Parfait: I was using Excel – heikeke May 14 '16 at 18:25

1 Answers1

2

I selectively added publisher, preorder and cover properties to the typical XML code, so the code for test is as follows:

<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
        <price>44.95</price>
        <publish_date>2000-10-01</publish_date>
        <description>An in-depth look at creating applications with XML.</description>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <preorder>2.49</preorder>
        <publish_date>2000-12-16</publish_date>
        <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
    </book>
    <book id="bk103">
        <author>Corets, Eva</author>
        <title>Maeve Ascendant</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <preorder>1.99</preorder>
        <publish_date>2000-11-17</publish_date>
        <cover>case binding</cover>
        <description>After the collapse of a nanotechnology society in England, the young survivors lay the foundation for a new society.</description>
    </book>
    <book id="bk104">
        <publisher>Pearson</publisher>
        <author>Corets, Eva</author>
        <title>Oberon's Legacy</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
        <publish_date>2001-03-10</publish_date>
        <description>In post-apocalypse England, the mysterious agent known only as Oberon helps to create a new life for the inhabitants of London. Sequel to Maeve Ascendant.</description>
    </book>
</catalog>

Here is an example showing one of the possible solutions that allows to process a table-like data stored as XML and retrieve a 2d array, that represents the table with header. It processes items according to provided XPath selector, considers item childnodes as properties, extracts property names and values, locates the properties at the right columns.

Option Explicit

Sub Test()

    Dim strBooksXML As String
    Dim arrBooks() As Variant

    ' get certain XML code
    strBooksXML = MyXMLData
    ' pass XML code and XPath selector to retrieve table-form array
    arrBooks = ConvertXMLToArray(strBooksXML, "//catalog/book")
    ' resulting array output
    Output Sheets(1), arrBooks

End Sub

Function ConvertXMLToArray(strXML As String, strItemSelector As String) As Variant()

    Dim objDOMDocument As Object
    Dim objPrpIdx As Object
    Dim objPrpVal As Object
    Dim lngItemNumber As Long
    Dim colItems As Object
    Dim objItem As Variant
    Dim objItemProperty As Variant
    Dim strPrev As String
    Dim strName As String
    Dim lngIndex As Long
    Dim arrItems() As Variant
    Dim varPrpName As Variant
    Dim varItemIndex As Variant

    Set objDOMDocument = CreateObject("MSXML2.DOMDocument")
    If Not objDOMDocument.LoadXML(strXML) Then
        Err.Raise objDOMDocument.parseError.ErrorCode, , objDOMDocument.parseError.reason
    End If
    Set objPrpIdx = CreateObject("Scripting.Dictionary") ' dictionary of property order indexes
    Set objPrpVal = CreateObject("Scripting.Dictionary") ' dictionary of property values
    lngItemNumber = 1
    Set colItems = objDOMDocument.SelectNodes(strItemSelector)
    For Each objItem In colItems
        strPrev = "" ' previous processed property name
        For Each objItemProperty In objItem.ChildNodes
            strName = objItemProperty.BaseName ' name of the property being processed
            If Not objPrpIdx.Exists(strName) Then ' no such property yet
                If strPrev = "" Then ' the property is the first
                    lngIndex = 0
                Else ' the property placed after another
                    lngIndex = objPrpIdx(strPrev) + 1
                End If
                ' increase all indexes that are greater or equal to processing property assigned index
                ' i. e. shift existing properties to insert new one
                For Each varPrpName In objPrpIdx
                    If objPrpIdx(varPrpName) >= lngIndex Then objPrpIdx(varPrpName) = objPrpIdx(varPrpName) + 1
                Next
                ' add new property name to dictionary of property order indexes with assigned index
                objPrpIdx(strName) = lngIndex
                ' add new property name to dictionary of property values, instantiate subdictionary of values
                Set objPrpVal(strName) = CreateObject("Scripting.Dictionary")
            End If
            objPrpVal(strName)(lngItemNumber) = objItemProperty.Text ' put property value with item index to the subdictionary
            strPrev = strName ' reassign previous property name
        Next
        lngItemNumber = lngItemNumber + 1
    Next
    ' rebuild dictionaries into 2d array for further output to worksheet
    ReDim arrItems(lngItemNumber - 1, objPrpIdx.Count - 1)
    For Each varPrpName In objPrpIdx ' process each
        arrItems(0, objPrpIdx(varPrpName)) = varPrpName ' put property name to header
        For Each varItemIndex In objPrpVal(varPrpName) ' process each item having the property
            arrItems(varItemIndex, objPrpIdx(varPrpName)) = objPrpVal(varPrpName)(varItemIndex)
        Next
    Next
    ConvertXMLToArray = arrItems

End Function

Sub Output(objSheet As Worksheet, arrCells() As Variant)

    With objSheet
        .Select
        .Cells.Delete
        With .Range(.Cells(1, 1), Cells(UBound(arrCells, 1) + 1, UBound(arrCells, 2) + 1))
            .NumberFormat = "@"
            .Value = arrCells
        End With
        .Columns.AutoFit
    End With
    With ActiveWindow
        .SplitColumn = 0
        .SplitRow = 1
        .FreezePanes = True
    End With

End Sub

Function MyXMLData()

    Dim strXML

    strXML = _
    "<catalog>"
        strXML = strXML & _
        "<book id=""bk101"">" & _
            "<author>Gambardella, Matthew</author>" & _
            "<title>XML Developer's Guide</title>" & _
            "<genre>Computer</genre>" & _
            "<price>44.95</price>" & _
            "<publish_date>2000-10-01</publish_date>" & _
            "<description>An in-depth look at creating applications " & _
            "with XML.</description>" & _
        "</book>"
        strXML = strXML & _
        "<book id=""bk102"">" & _
            "<author>Ralls, Kim</author>" & _
            "<title>Midnight Rain</title>" & _
            "<genre>Fantasy</genre>" & _
            "<price>5.95</price>" & _
            "<preorder>2.49</preorder>" & _
            "<publish_date>2000-12-16</publish_date>" & _
            "<description>A former architect battles corporate zombies, " & _
            "an evil sorceress, and her own childhood to become queen " & _
            "of the world.</description>" & _
        "</book>"
        strXML = strXML & _
        "<book id=""bk103"">" & _
            "<author>Corets, Eva</author>" & _
            "<title>Maeve Ascendant</title>" & _
            "<genre>Fantasy</genre>" & _
            "<price>5.95</price>" & _
            "<preorder>1.99</preorder>" & _
            "<publish_date>2000-11-17</publish_date>" & _
            "<cover>case binding</cover>" & _
            "<description>After the collapse of a nanotechnology " & _
            "society in England, the young survivors lay the " & _
            "foundation for a new society.</description>" & _
        "</book>"
        strXML = strXML & _
        "<book id=""bk104"">" & _
            "<publisher>Pearson</publisher>" & _
            "<author>Corets, Eva</author>" & _
            "<title>Oberon's Legacy</title>" & _
            "<genre>Fantasy</genre>" & _
            "<price>5.95</price>" & _
            "<publish_date>2001-03-10</publish_date>" & _
            "<description>In post-apocalypse England, the mysterious " & _
            "agent known only as Oberon helps to create a new life " & _
            "for the inhabitants of London. Sequel to Maeve " & _
            "Ascendant.</description>" & _
        "</book>"
        strXML = strXML & _
    "</catalog>"
    MyXMLData = strXML

End Function

The resulting output is as follows for me:

output

This approach uses dictionaries that may be quite slow for large XML data, for that case better to rework the code to use arrays instead of dictionaries, like here with JSON processing.

Community
  • 1
  • 1
omegastripes
  • 12,351
  • 4
  • 45
  • 96