1

I'm working on taking some input data in excel, parsing it to xml and using that to run a SQL stored procedure, but I'm running into performance issue on the xml parsing. The input sheet looks something like this:

Dates_|_Name1_Name2_Name3_..._NameX
Date1 |
Date2 |
. . . |
Date1Y|

I've got some code to loop though each cell and parse out the data into an xml string but even for about a 300 by 300 grid the execution takes something on the order of five minutes and I'm looking to use data sets that could be several thousand columns long. I've tries a couple things to help speed it up like reading the data into a Variant then iterating though that or excluding DoEvents but I haven't been able to get the speed up. Here's the bit of code that's the issue:

Dim lastRow As Long
lRows = (oWorkSheet.Cells(Rows.Count, 1).End(xlUp).Row)
Dim lastColumn As Long
lCols = (oWorkSheet.Cells(1, Columns.Count).End(xlToLeft).Column)
Dim sheet As Variant
With Sheets(sName)
  sheet = .Range(.Cells(1, 1), .Cells(lRows, lCols))
End With
ReDim nameCols(lCols) As String

...

resultxml = "<DataSet>"
For i = 2 To rows
    resultxml = resultxml & "<DateRow>"

    For j = 1 To cols
        If Trim(sheet(i, j)) <> "" Then
            lResult = "<" & nameCols(j) & ">"
            rResult = "</" & nameCols(j) & ">"
            tmpValue = Trim(sheet(i, j))
            If IsDate(tmpValue) And Not IsNumeric(tmpValue) Then
                If Len(tmpValue) >= 8 Then
                    tmpValue = Format(tmpValue, "yyyy-mm-dd")
                End If
            End If
            resultxml = resultxml & lResult & tmpValue & rResult
            DoEvents
        End If
    Next j
    resultxml = resultxml & "</DateRow>"
Next i

resultxml = resultxml & "</DataSet>"

Any advice for getting the run time down would be greatly appreciated.

user1267983
  • 93
  • 1
  • 9
  • Why do you have `DoEvents` in your `j` loop ? Can you try taking that out ? – Robin Mackenzie Dec 06 '16 at 02:16
  • It's just to keep excel from hanging while the method is running, I did try taking it out, but I didn't see a difference. – user1267983 Dec 06 '16 at 02:51
  • Are *some input data in excel* have exactly same or similar data types? – Susilo Dec 06 '16 at 03:02
  • 1
    Building large strings by concatenating pieces together in a loop can be slow. Try (eg) http://codereview.stackexchange.com/questions/67596/a-lightning-fast-stringbuilder to build your string. – Tim Williams Dec 06 '16 at 03:15
  • are your data always start with date at first column then followed by other type of data? it is an like an student attendance sheet? – Susilo Dec 06 '16 at 03:31
  • Yes, the first column is always a list if dates followed by other numeric data in the table itself for each column name. – user1267983 Dec 06 '16 at 13:58

2 Answers2

5

Consider using MSXML, a comprehensive W3C compliant library of XML APIs which you can use to build your XML with DOM methods (createElement, appendChild, setAttribute) instead of concatenating text strings. XML is not quite a text file but a markup file with encoding and tree structure. Excel comes equipped with the MSXML COM object by reference or late-binding, and can iteratively build a tree from Excel data as shown below.

With 300 rows by 12 cols of random dates, below didn't even take a minute (literally seconds after clicking macro) AND it even pretty printed raw output with line breaks and indentation using an embedded XSLT stylesheet (if you do not pretty print, the MSXML outputs document as one long, continuous line).

Input

Name Date Spreadsheet

VBA (of course align to actual data)

Sub xmlExport()
On Error GoTo ErrHandle
    ' VBA REFERENCE MSXML, v6.0 '
    Dim doc As New MSXML2.DOMDocument60, xslDoc As New MSXML2.DOMDocument60, newDoc As New MSXML2.DOMDocument60
    Dim root As IXMLDOMElement, dataNode As IXMLDOMElement, datesNode As IXMLDOMElement, namesNode As IXMLDOMElement
    Dim i As Long, j As Long
    Dim tmpValue As Variant

    ' DECLARE XML DOC OBJECT '
    Set root = doc.createElement("DataSet")
    doc.appendChild root

    ' ITERATE THROUGH ROWS '
    For i = 2 To Sheets(1).UsedRange.Rows.Count

        ' DATA ROW NODE '
        Set dataNode = doc.createElement("DataRow")
        root.appendChild dataNode

        ' DATES NODE '
        Set datesNode = doc.createElement("Dates")
        datesNode.Text = Sheets(1).Range("A" & i)
        dataNode.appendChild datesNode

        ' NAMES NODE '
        For j = 1 To 12
            tmpValue = Sheets(1).Cells(i, j + 1)
            If IsDate(tmpValue) And Not IsNumeric(tmpValue) Then
                Set namesNode = doc.createElement("Name" & j)
                namesNode.Text = Format(tmpValue, "yyyy-mm-dd")
                dataNode.appendChild namesNode
            End If
        Next j

    Next i

    ' PRETTY PRINT RAW OUTPUT '
    xslDoc.LoadXML "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" _
            & "<xsl:stylesheet version=" & Chr(34) & "1.0" & Chr(34) _
            & "                xmlns:xsl=" & Chr(34) & "http://www.w3.org/1999/XSL/Transform" & Chr(34) & ">" _
            & "<xsl:strip-space elements=" & Chr(34) & "*" & Chr(34) & " />" _
            & "<xsl:output method=" & Chr(34) & "xml" & Chr(34) & " indent=" & Chr(34) & "yes" & Chr(34) & "" _
            & "            encoding=" & Chr(34) & "UTF-8" & Chr(34) & "/>" _
            & " <xsl:template match=" & Chr(34) & "node() | @*" & Chr(34) & ">" _
            & "  <xsl:copy>" _
            & "   <xsl:apply-templates select=" & Chr(34) & "node() | @*" & Chr(34) & " />" _
            & "  </xsl:copy>" _
            & " </xsl:template>" _
            & "</xsl:stylesheet>"

    xslDoc.async = False
    doc.transformNodeToObject xslDoc, newDoc
    newDoc.Save ActiveWorkbook.Path & "\Output.xml"

    MsgBox "Successfully exported Excel data to XML!", vbInformation
    Exit Sub

ErrHandle:
    MsgBox Err.Number & " - " & Err.Description, vbCritical
    Exit Sub

End Sub

Output

<?xml version="1.0" encoding="UTF-8"?>
<DataSet>
    <DataRow>
        <Dates>Date1</Dates>
        <Name1>2016-04-23</Name1>
        <Name2>2016-09-22</Name2>
        <Name3>2016-09-23</Name3>
        <Name4>2016-09-24</Name4>
        <Name5>2016-10-31</Name5>
        <Name6>2016-09-26</Name6>
        <Name7>2016-09-27</Name7>
        <Name8>2016-09-28</Name8>
        <Name9>2016-09-29</Name9>
        <Name10>2016-09-30</Name10>
        <Name11>2016-10-01</Name11>
        <Name12>2016-10-02</Name12>
    </DataRow>
    <DataRow>
        <Dates>Date2</Dates>
        <Name1>2016-06-27</Name1>
        <Name2>2016-08-14</Name2>
        <Name3>2016-07-08</Name3>
        <Name4>2016-08-22</Name4>
        <Name5>2016-11-03</Name5>
        <Name6>2016-07-28</Name6>
        <Name7>2016-08-23</Name7>
        <Name8>2016-11-01</Name8>
        <Name9>2016-11-01</Name9>
        <Name10>2016-08-11</Name10>
        <Name11>2016-08-18</Name11>
        <Name12>2016-09-23</Name12>
    </DataRow>
    ...
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

I wanted to compare the Psuedo-String Builder that I used for Turn Excel range into VBA string against Parfait's implemetaion of MSXML to ouput the range to xml. I modified Parfait's code adding a timer and allowing non-date values.

The Data had a header row and 300 rows by 300 Columns (90,000 cells). Although the String Builder was roughly 400% faster I would still use Parfait's MSXML approach. Being an industry standard, it is already well documented.

enter image description here

Sub XMLFromRange()
    Dim Start: Start = Timer
    Const AVGCELLLENGTH As Long = 100
    Dim LG As Long, index As Long, x As Long, y As Long
    Dim data As Variant, Headers As Variant
    Dim result As String, s As String
    data = getDataArray
    Headers = getHeaderArray(data)

    result = Space(UBound(data, 1) * UBound(data, 2) * AVGCELLLENGTH)
    index = 1
    Mid(result, index, 11) = "<DataSet>" & vbCrLf
    index = index + 11

    For x = 2 To UBound(data, 1)

        Mid(result, index, 11) = "<DataRow>" & vbCrLf
        index = index + 11
        For y = 1 To UBound(data, 2)

            LG = Len(Headers(1, y))
            Mid(result, index, LG) = Headers(1, y)
            index = index + LG

            s = RTrim(data(x, y))
            LG = Len(s)
            Mid(result, index, LG) = s
            index = index + LG

            LG = Len(Headers(2, y))
            Mid(result, index, LG) = Headers(2, y)
            index = index + LG

        Next
        Mid(result, index, 12) = "</DataRow>" & vbCrLf
        index = index + 12
    Next
    Mid(result, index, 12) = "</DataSet>" & vbCrLf
    index = index + 12

    result = Left(result, index)

    MsgBox (Timer - Start) & " Second(s)" & vbCrLf & _
    (UBound(data, 1) - 1) * UBound(data, 2) & " Data Cells", vbInformation, "Execution Time"

    Dim myFile As String
    myFile = ThisWorkbook.Path & "\demo.txt"

    Open myFile For Output As #1
    Print #1, result
    Close #1

    Shell "Notepad.exe " & myFile, vbNormalFocus
End Sub

Function getDataArray()
    With Worksheets("Sheet1")
        getDataArray = .Range(.Range("A" & .Rows.Count).End(xlUp), .Cells(1, .Columns.Count).End(xlToLeft))
    End With
End Function

Function getHeaderArray(DataArray As Variant)
    Dim y As Long
    Dim Headers() As String
    ReDim Headers(1 To 2, 1 To UBound(DataArray, 2))
    For y = 1 To UBound(DataArray, 2)
        Headers(1, y) = "<" & DataArray(1, y) & ">"
        Headers(2, y) = "</" & DataArray(1, y) & ">" & vbCrLf
    Next
    getHeaderArray = Headers
End Function
Community
  • 1
  • 1
  • Be careful with this approach as Notepad's default encoding is ANSI not XML's default of UTF-8. Special symbols, entities, characters can be lost. Again, because of its markup rules, XMLs should not be treated as strictly text files. Also, your example has spaces in element names, an invalid rule, rendering a not well-formed XML. – Parfait Dec 06 '16 at 14:32
  • @Parfait Notepad was just used to open the text file not to write it. Besides my example was just to demonstrate the String Builder Pattern. –  Dec 06 '16 at 14:35
  • @Parfait you could optimize your code by using arrays instead of accessing each cell's value. It doesn't make much difference with the small datasets that we use but the OP final dataset will be `"several thousand columns long"`. On the other hand I know that you are just demonstrating the effectiveness of `MSXML2` and it is not really your responsibility to optimize it. –  Dec 06 '16 at 14:41
  • Thanks for the advice, I'll look into packaging MSXML onto the users machines so I can transition to that, but the String Builder Pattern can serve for the time being. – user1267983 Dec 06 '16 at 21:06