1

new user and infrequent/inexperienced coder here. I found a solution awhile back on this site for a VBA macro that creates an XML file for each line of a spreadsheet. I work at an archives and our digital repository system requires XML metadata files that have the same filename (with an added .metadata extension) as the file they describe; this is so the system will recognize it as metadata and not a discrete file. To achieve this, we record metadata in a spreadsheet with column headings that match our metadata schema elements and run a VBA macro to create an XML file for each row of data.

The macro actually works perfectly for creating individual XML files from each row of a spreadsheet. The problem occurred after we updated our metadata schema to support repeated elements. When I run the VBA macro on a spreadsheet with repeated column headings/elements the resulting XML file only has data from the last instance of the repeated element. This same data value from the last repeated element is also applied to the previous instances.

Here's what I'm talking about. As you can see, the repeated "RecordContributorIndividual" elements in the XML file have only the data from the final instance of the element (Row 1, Column 7) in the spreadsheet:

<?xml version="1.0" encoding="UTF-8"?>
  <vtcore xmlns="http://www.sec.state.vt.us/vtcore">
    <RecordCreatorIndividual>Peter Shumlin</RecordCreatorIndividual>
    <RecordContributorIndividual>Stuck</RecordContributorIndividual>
    <RecordContributorIndividual>Stuck</RecordContributorIndividual>
    <RecordContributorIndividual>Stuck</RecordContributorIndividual>
    <RecordContributorIndividual>Stuck</RecordContributorIndividual>
    <RecordContributorIndividual>Stuck</RecordContributorIndividual>
    <RecordTitle>President Ronald Reagan Day proclamation</RecordTitle>
    <RecordDesc></RecordDesc>

Spreadsheet Repeated Elements

What I want to achieve is a VBA code that will not apply the last cell value of a repeated element to all of the previous instances of that element but will instead pull whatever is actually in the spreadsheet cell under each element. I've pasted the VBA code below. I have a feeling that the problem lies somewhere down in the "doc.getElementsByTagName" region but I'm not positive. I feel like I'm close but I'm completely stuck. Any help is greatly appreciated!

Sub testXLSMtovtcoreXML()
 sTemplateXML = _
    "<?xml version='1.0' encoding='UTF-8'?>" + vbNewLine + _
    "<vtcore xmlns='http://www.sec.state.vt.us/vtcore'>" + vbNewLine + _
    "   <RecordCreatorIndividual>" + "   </RecordCreatorIndividual>" + "   
    <RecordContributorIndividual>" + "   </RecordContributorIndividual>" + 
    vbNewLine + _
    "   <RecordContributorIndividual>" + "   </RecordContributorIndividual>" 
    + "   <RecordContributorIndividual>" + "   
    </RecordContributorIndividual>" + vbNewLine + _
    "   <RecordContributorIndividual>" + "   </RecordContributorIndividual>" 
    + "   <RecordContributorIndividual>" + "   
    </RecordContributorIndividual>" + vbNewLine + _
    "   <RecordTitle>" + "  </RecordTitle>" + "   <RecordDesc>" + " 
    </RecordDesc>" + "  <RecordDate>" + "   </RecordDate>" + "  
    <RecordDate>" + "   </RecordDate>" + vbNewLine + _
    "   <RecordDate>" + "   </RecordDate>" + "   <RecordDate>" + "   
    </RecordDate>" + "   <RecordDate>" + "   </RecordDate>" + vbNewLine + _
    "   <Agency>" + "   </Agency>" + "   <Domain>" + "   </Domain>" + "   
    <Activity>" + "   </Activity>" + "   <RecordType>" + "   </RecordType>" 
    + vbNewLine + _
    "   <ClassificationCode>" + "   </ClassificationCode>" + "   
    <RelatedRecords>" + "   </RelatedRecords>" + "   <RelatedRecords>" + "   
    </RelatedRecords>" + vbNewLine + _
    "   <RelatedRecords>" + "   </RelatedRecords>" + "   <RelatedRecords>" + 
    "   </RelatedRecords>" + "   <RelatedRecords>" + "   </RelatedRecords>" 
    + vbNewLine + _
    "   <RecordIdentifier>" + "   </RecordIdentifier>" + "   <PublicAccess>" 
    + "   </PublicAccess>" + "   <PublicAccessCitation>" + "   
    </PublicAccessCitation>" + vbNewLine + _
    "   <PublicAccessCitation>" + "   </PublicAccessCitation>" + "   
    <PublicAccessCitation>" + "   </PublicAccessCitation>" + vbNewLine + _
    "   <PublicAccessCitation>" + "   </PublicAccessCitation>" + "   
    <PublicAccessCitation>" + "   </PublicAccessCitation>" + vbNewLine + _
    "   <Subject>" + "   </Subject>" + "   <Subject>" + "   </Subject>" + "   
    <Subject>" + "   </Subject>" + "   <Subject>" + "   </Subject>" + 
    vbNewLine + _
    "   <Subject>" + "   </Subject>" + vbNewLine + _
    "</vtcore>" + vbNewLine

 Set doc = CreateObject("MSXML2.DOMDocument")
 doc.async = False
 doc.validateOnParse = False
 doc.resolveExternals = False

 With ActiveWorkbook.Worksheets(1)
 lLastRow = .UsedRange.Rows.Count

 For lRow = 2 To lLastRow
  sFileName = .Cells(lRow, 1).Value
  sRecordCreatorIndividual = .Cells(lRow, 2).Value
  sRecordContributorIndividual = .Cells(lRow, 3).Value
  sRecordContributorIndividual = .Cells(lRow, 4).Value
  sRecordContributorIndividual = .Cells(lRow, 5).Value
  sRecordContributorIndividual = .Cells(lRow, 6).Value
  sRecordContributorIndividual = .Cells(lRow, 7).Value
  sRecordTitle = .Cells(lRow, 8).Value
  sRecordDesc = .Cells(lRow, 9).Value
  sRecordDate = .Cells(lRow, 10).Value
  sRecordDate = .Cells(lRow, 11).Value
  sRecordDate = .Cells(lRow, 12).Value
  sRecordDate = .Cells(lRow, 13).Value
  sRecordDate = .Cells(lRow, 14).Value
  sAgency = .Cells(lRow, 15).Value
  sDomain = .Cells(lRow, 16).Value
  sActivity = .Cells(lRow, 17).Value
  sRecordType = .Cells(lRow, 18).Value
  sClassificationCode = .Cells(lRow, 19).Value
  sRelatedRecords = .Cells(lRow, 20).Value
  sRelatedRecords = .Cells(lRow, 21).Value
  sRelatedRecords = .Cells(lRow, 22).Value
  sRelatedRecords = .Cells(lRow, 23).Value
  sRelatedRecords = .Cells(lRow, 24).Value
  sRecordIdentifier = .Cells(lRow, 25).Value
  sPublicAccess = .Cells(lRow, 26).Value
  sPublicAccessCitation = .Cells(lRow, 27).Value
  sPublicAccessCitation = .Cells(lRow, 28).Value
  sPublicAccessCitation = .Cells(lRow, 29).Value
  sPublicAccessCitation = .Cells(lRow, 30).Value
  sPublicAccessCitation = .Cells(lRow, 31).Value
  sSubject = .Cells(lRow, 32).Value
  sSubject = .Cells(lRow, 33).Value
  sSubject = .Cells(lRow, 34).Value
  sSubject = .Cells(lRow, 35).Value
  sSubject = .Cells(lRow, 36).Value

 doc.LoadXML sTemplateXML
 doc.getElementsByTagName("RecordCreatorIndividual")(0).appendChild 
 doc.createTextNode(sRecordCreatorIndividual)
 doc.getElementsByTagName("RecordContributorIndividual")(0).appendChild 
 doc.createTextNode(sRecordContributorIndividual)
 doc.getElementsByTagName("RecordContributorIndividual")(1).appendChild 
 doc.createTextNode(sRecordContributorIndividual)
 doc.getElementsByTagName("RecordContributorIndividual")(2).appendChild 
 doc.createTextNode(sRecordContributorIndividual)
 doc.getElementsByTagName("RecordContributorIndividual")(3).appendChild 
 doc.createTextNode(sRecordContributorIndividual)
 doc.getElementsByTagName("RecordContributorIndividual")(4).appendChild 
 doc.createTextNode(sRecordContributorIndividual)
 doc.getElementsByTagName("RecordTitle")(0).appendChild 
 doc.createTextNode(sRecordTitle)
 doc.getElementsByTagName("RecordDesc")(0).appendChild 
 doc.createTextNode(sRecordDesc)
 doc.getElementsByTagName("RecordDate")(0).appendChild 
 doc.createTextNode(sRecordDate)
 doc.getElementsByTagName("RecordDate")(1).appendChild 
 doc.createTextNode(sRecordDate)
 doc.getElementsByTagName("RecordDate")(2).appendChild 
 doc.createTextNode(sRecordDate)
 doc.getElementsByTagName("RecordDate")(3).appendChild 
 doc.createTextNode(sRecordDate)
 doc.getElementsByTagName("RecordDate")(4).appendChild 
 doc.createTextNode(sRecordDate)
 doc.getElementsByTagName("Agency")(0).appendChild 
 doc.createTextNode(sAgency)
 doc.getElementsByTagName("Domain")(0).appendChild 
 doc.createTextNode(sDomain)
 doc.getElementsByTagName("Activity")(0).appendChild 
 doc.createTextNode(sActivity)
 doc.getElementsByTagName("RecordType")(0).appendChild 
 doc.createTextNode(sRecordType)
 doc.getElementsByTagName("ClassificationCode")(0).appendChild 
 doc.createTextNode(sClassificationCode)
 doc.getElementsByTagName("RelatedRecords")(0).appendChild 
 doc.createTextNode(sRelatedRecords)
 doc.getElementsByTagName("RelatedRecords")(1).appendChild 
 doc.createTextNode(sRelatedRecords)
 doc.getElementsByTagName("RelatedRecords")(2).appendChild 
 doc.createTextNode(sRelatedRecords)
 doc.getElementsByTagName("RelatedRecords")(3).appendChild 
 doc.createTextNode(sRelatedRecords)
 doc.getElementsByTagName("RelatedRecords")(4).appendChild 
 doc.createTextNode(sRelatedRecords)
 doc.getElementsByTagName("RecordIdentifier")(0).appendChild 
 doc.createTextNode(sRecordIdentifier)
 doc.getElementsByTagName("PublicAccess")(0).appendChild 
 doc.createTextNode(sPublicAccess)
 doc.getElementsByTagName("PublicAccessCitation")(0).appendChild 
 doc.createTextNode(sPublicAccessCitation)
 doc.getElementsByTagName("PublicAccessCitation")(1).appendChild 
 doc.createTextNode(sPublicAccessCitation)
 doc.getElementsByTagName("PublicAccessCitation")(2).appendChild 
 doc.createTextNode(sPublicAccessCitation)
 doc.getElementsByTagName("PublicAccessCitation")(3).appendChild 
 doc.createTextNode(sPublicAccessCitation)
 doc.getElementsByTagName("PublicAccessCitation")(4).appendChild 
 doc.createTextNode(sPublicAccessCitation)
 doc.getElementsByTagName("Subject")(0).appendChild 
 doc.createTextNode(sSubject)
 doc.getElementsByTagName("Subject")(1).appendChild 
 doc.createTextNode(sSubject)
 doc.getElementsByTagName("Subject")(2).appendChild 
 doc.createTextNode(sSubject)
 doc.getElementsByTagName("Subject")(3).appendChild 
 doc.createTextNode(sSubject)
 doc.getElementsByTagName("Subject")(4).appendChild 
 doc.createTextNode(sSubject)
 doc.Save sFileName + ".metadata"
Next

End With
End Sub
zwhitaker
  • 13
  • 4
  • 1
    Please provide textual information inside the question and not as external link and not as a picture of text. – Yunnosch Dec 19 '17 at 20:36
  • @Yunnosch I apologize for my inexperience on this site. I have edited the first link to show the XML instead of a link to a screenshot. Apparently I'm not allowed inline screenshots yet. The second link is to a screenshot of a spreadsheet and I cannot really think of another way to display that data as text and it still be intelligible. I'm open to any alternative suggestions for that, as I want to make this question clear as possible. Again, my apologies. – zwhitaker Dec 19 '17 at 21:14

1 Answers1

0

Consider building XML dynamically using the MSXML library with its createElement, createNode, appendChild methods that does not hard code node names or text values but pulls them from cells. And then use the Identity Transform XSLT to pretty print the output. There is no need to build a text template to adjust in code. Specifically, createNode is used since you require a default namespace in document xmlns="http://www.sec.state.vt.us/vtcore":

Excel Input Data

Screenshot of Data

VBA (using early binding with MSXML reference object)

Option Explicit

Sub XMLExport()
On Error GoTo ErrHandle
    Dim lastCol As Long, lastRow As Long
    Dim xlrow As Long

    ' WRITE TO XML
    With ThisWorkbook.Sheets(1)
        lastCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row

        For xlrow = 2 To lastRow
            Call BuildXML(xlrow)
        Next xlrow
    End With

    MsgBox "Successfully migrated Excel data into XML files!", vbInformation

ExitHandle:
    Exit Sub

ErrHandle:
    MsgBox Err.Number & " - " & Err.Description, vbCritical
    Resume ExitHandle

End Sub

Function BuildXML(i As Long)
On Error GoTo ErrHandle
    ' REFERENCE Microsoft XML, v6.0 UNDER TOOLS\REFERENCES
    Dim doc As New MSXML2.DOMDocument, xslDoc As New MSXML2.DOMDocument, newDoc As New MSXML2.DOMDocument
    Dim root As IXMLDOMNode, colNode As IXMLDOMNode

    Dim xslFile As String, xml_filename As String
    Dim lastCol As Long, lastRow As Long
    Dim j As Long

    ' DECLARE XML DOC OBJECT
    Set root = doc.createNode(1, "vtcore", "http://www.sec.state.vt.us/vtcore")
    doc.appendChild root

    ' WRITE TO XML
    With ThisWorkbook.Sheets(1)
        lastCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
        lastRow = .Cells(.Rows.Count, "A").End(xlUp).Row

        xml_filename = Mid(.Cells(i, 1), 1, InStr(.Cells(i, 1), ".") - 1) & ".metadata"

        For j = 2 To lastCol

            Set colNode = doc.createNode(1, .Cells(1, j), "http://www.sec.state.vt.us/vtcore")
            colNode.Text = .Cells(i, j)
            root.appendChild colNode

        Next j
    End With

    ' PRETTY PRINT OUTPUT WITH INDENTATION AND LINE BREAKS
    xslDoc.LoadXML "<?xml version=" & Chr(34) & "1.0" & Chr(34) & "?>" _
            & "<xsl:stylesheet version=" & Chr(34) & "1.0" & Chr(34) _
            & "                xmlns:xsl=" & Chr(34) & "http://www.w3.org/1999/XSL/Transform" & Chr(34) & ">" _
            & "  <xsl:strip-space elements=" & Chr(34) & "*" & Chr(34) & " />" _
            & "  <xsl:output method=" & Chr(34) & "xml" & Chr(34) & " indent=" & Chr(34) & "yes" & Chr(34) & "" _
            & "            encoding=" & Chr(34) & "UTF-8" & Chr(34) & "/>" _
            & "  <xsl:template match=" & Chr(34) & "node() | @*" & Chr(34) & ">" _
            & "    <xsl:copy>" _
            & "       <xsl:apply-templates select=" & Chr(34) & "node() | @*" & Chr(34) & " />" _
            & "    </xsl:copy>" _
            & "  </xsl:template>" _
            & "</xsl:stylesheet>"

    xslDoc.async = False
    doc.transformNodeToObject xslDoc, newDoc
    newDoc.Save Application.ActiveWorkbook.Path & "\" & xml_filename
    Debug.Print xml_filename

ExitHandle:
    Set doc = Nothing: Set xslDoc = Nothing: Set newDoc = Nothing
    Exit Function

ErrHandle:
    MsgBox Err.Number & " - " & Err.Description, vbCritical
    Resume ExitHandle

End Function

Output

<?xml version="1.0" encoding="UTF-8"?>
<vtcore xmlns="http://www.sec.state.vt.us/vtcore">
    <FileName>16-001 President Ronald Reagan Day.pdf</FileName>
    <RecordCreatorIndividual>Peter Shumulin</RecordCreatorIndividual>
    <RecordCreatorIndividual>Help </RecordCreatorIndividual>
    <RecordCreatorIndividual>I </RecordCreatorIndividual>
    <RecordCreatorIndividual>Am</RecordCreatorIndividual>
    <RecordCreatorIndividual>Realy</RecordCreatorIndividual>
    <RecordCreatorIndividual>Stuck</RecordCreatorIndividual>
    <RecordCreatorIndividual>President Ronald Reagan Day proclamation</RecordCreatorIndividual>
</vtcore>
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thank you for responding to my problem. When I try to run your solution I'm getting "Compile error: User-defined type not identified" and the "doc As New MSXML2.DOMDocument" statement is highlighted. I made sure that Microsoft XML v6.0 was added under Tools/References. I have no idea if this piece of information is relevant or not but "http://www.sec.state.vt.us/vtcore" is not an address that actually exists. it's just hard-coded in there, so it may not work as a reference. – zwhitaker Dec 20 '17 at 14:46
  • As commented directly in code, did you select Microsoft XML , v6.0 under Tools \ References (on menu) in the VBE editor? – Parfait Dec 20 '17 at 14:52
  • I did do that and it still comes back with the same error at the same location. Right now, the references I have selected are: Visual Basic for Applications; Microsoft Excel 15.0 Object Library; OLE Automation; Microsoft Office 15.0 Object Library; and, Microsoft XML, v6.0. – zwhitaker Dec 20 '17 at 14:58
  • Very strange! I have those exact five references selected and works great in my MS Excel 2013. I even copied and pasted exact code posted here and all works smoothly. Even v, 3.0 works! Check if you did not modify code in some way. Try in brand new workbook or different machine. – Parfait Dec 20 '17 at 15:17
  • I tried a brand new workbook on a different machine and it didn't work - until I changed the reference to v3.0 and it ran perfectly! Thank you for sticking with me. I did notice a couple of issues that are due to a lack of communication on my part. The spreadsheet will ultimately have multiple rows of data; is there a way to create a separate XML file for each row and name the file after whatever value is in the cell under the "Filename" column for each row (perhaps something similar to "doc.Save sFileName + ".metadata"" in the original code)? Sorry to be a pain; I really appreciate the help. – zwhitaker Dec 20 '17 at 15:57
  • To add some context to the above comment. Each row in the spreadsheet will represent a discrete file. Our digital repository requires each file to have its own corresponding XML metadata file that needs to have the same filename as the file it describes. And each XML file has to have a .METADATA extension added to it so the digital repository's file ingest application recognizes it as metadata. – zwhitaker Dec 20 '17 at 15:58
  • See updated using a user-defined function to build the XML that iteratively passes the row number. XML filename will take use first column as its name removing extension and adding `.metadata`. Try adjusting code to meet needs as it is part of the learning process. – Parfait Dec 20 '17 at 17:33
  • 1
    Thank you so much for your help! You're a lifesaver. The code works perfectly. This has actually been very informative and I'm looking forward to getting in there and figuring out the nuts and bolts. Thanks again. Cheers. – zwhitaker Dec 20 '17 at 18:33