3

long time reader, but first post on here. Any and all help is highly appreciated!

I am working on a data scraper to obtain our clients' market results from the grid operator, and require a series of xml queries to do so. I am able to connect, authenticate and receive a response to my query using SOAP schema (required by grid operator). However, when I try to load the xml string I receive from the server into a DOMDocument60 object using the LoadXML function, it strips all of the nodes from the response and results in an xml document with only the header in one node and a concatenation of all the values stored in all nodes of the string in a second node.

Hence, I cannot extract the values I need. They have varying lengths and I have no way of knowing when one value finishes and the next one starts. Additionally, because the node names have been stripped, when I try and obtain a node list of all nodes called "DSRSRREGAwardHourly" I get an empty list.

XML received from server after querying (I've left only one of the hourly nodes to simplify)

<?xml version='1.0' encoding='UTF-8'?>
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
  <Body>
    <QueryResponse xmlns="http://emkt.pjm.com/emkt/xml">
      <DSRSRREGAwardSet>
        <DSRSRREGAward day="2018-12-01" location="1960147390">
          <DSRSRREGAwardHourly hour="1">
            <SynchOfferMW>0.5</SynchOfferMW>
            <RegOfferMW>0</RegOfferMW>
            <SelfScheduledMW>0</SelfScheduledMW>
            <SynchAwardedMW>0</SynchAwardedMW>
            <RegAwardedMW>0</RegAwardedMW>
            <RegOfferPriceUsed>0</RegOfferPriceUsed>
            <RegdOfferMW>0</RegdOfferMW>
            <RegdMW>0</RegdMW>
          </DSRSRREGAwardHourly>
        </DSRSRREGAward>
      </DSRSRREGAwardSet>
    </QueryResponse>
  </Body>
</Envelope>

Relevant VBA code I'm using

'Variable definition
Dim oWinhttp As WinHttpRequest
Dim ReturnXML As MSXML2.DOMDocument60
Dim ItemList As IXMLDOMNodeList
Dim XMLItem As IXMLDOMNode

'Variable initialization
Set oWinhttp = New WinHttpRequest
Set ReturnXML = New MSXML2.DOMDocument60

'Run the query against the server
With oWinhttp
  Call .Open("POST", WebURL)
  Call .SetRequestHeader("Cookie", "pjmauth=" & tokenStr)
  Call .SetRequestHeader("Content-Type", "text/xml")
  Call .SetRequestHeader("Content-Length", Len(xmlSubmittal))
  Call .Send(xmlSubmittal)
  Call .WaitForResponse

  'Store the return XML into DOM Document
  ReturnXML.async = False
  ReturnXML.validateOnParse = False
  ReturnXML.LoadXML (.ResponseText)

  'Terminate connection
  Call oWinhttp.abort
End With

'Extract nodes we are going to need
Set ItemList = ReturnXML.SelectNodes("//DSRSRREGAwardHourly")
Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
Angel
  • 33
  • 3
  • 1
    I posted before going to bed, and today has been crazy, but it worked like a charm, thanks @T.M.! – Angel Feb 22 '19 at 20:02

1 Answers1

0

Simple XML parsing using name spaces

Try the following code lines

  • [1] setting a name space definition via

    ReturnXML.setProperty "SelectionNamespaces", xmlNameSpaces

where xmlNameSpaces is just a string variable replicating a prior Namespace (xmlns) attribute within the <QueryResponse> node.

  • [2] Executing very basic loops through the node structure. As apparently there are only subnodes with textual entries in each node, this code example doesn't check for empty nodes (Consider using more elaborated XPath parsing and/or the use of the special - platform independant - query language XSLT).

Relevant code snippet

' ...
'[1] Store the return XML into DOM Document using name spaces defining a leading "d:" prefix
ReturnXML.async = False
ReturnXML.validateOnParse = False

Dim xmlNameSpaces$
xmlNameSpaces = "xmlns:d=""http://emkt.pjm.com/emkt/xml"""      ' define namespaces
ReturnXML.setProperty "SelectionNamespaces", xmlNameSpaces      ' set namespace prop

ReturnXML.LoadXML (.ResponseText)                   ' get content

'[2] Loop through node lists (attention to the chosen namespace prefix)
Dim ItemList As MSXML2.IXMLDOMNodeList, ItemList2 As MSXML2.IXMLDOMNodeList
Dim item As MSXML2.IXMLDOMNode, item2 As MSXML2.IXMLDOMNode
Set ItemList = ReturnXML.SelectNodes("//d:DSRSRREGAwardHourly") ' <~~ leading "d:" :-)

For Each item In ItemList
     Debug.Print "Day:  " & item.SelectSingleNode("../@day").Text, _
                 "Hour: " & item.SelectSingleNode("./@hour").Text
     Set ItemList2 = item.SelectNodes("d:*")
     For Each item2 In ItemList2
        Debug.Print vbTab & item2.nodeName, item2.Text
     Next item2
Next item

Related link

C.f. Obtain attribute names from XML using VBA

Addendum due to comment

"I'm curious now, was the problem that the string had 2 different namespaces defined and hence was ignoring the second one?"

Basically yes; the latter declaration in hierarchy reigns over the prior one in your xml data example (with exception of explicitly prefixed element names).

  • a) The whole XML structure in the concrete example only shows "normal" = unprefixed element names such as <SynchOfferMW> (instead of a possible <extra:SynchOfferMW> which would be based on an additional explicit prefix assignment within a prior namespace declaration xmlns:extra="http://..."). With unprefixed namespaces you are declaring two default namespaces, one for the <Envelope> element, the second for the <QueryResponse> element.
  • b) Due to the w3.org definition of namespaces/6.2 "the scope of a ► default namespace declaration extends from the beginning of the start-tag in which it appears to the end of the corresponding end-tag, excluding the scope of any inner default namespace declarations. ... A default namespace declaration applies to all unprefixed element names within its scope." So the <QueryResponse> element and all its (unprefixed) descendants of this example belong to the second default namespace.
T.M.
  • 9,436
  • 3
  • 33
  • 57
  • 1
    That worked like a charm, thanks for your insight! I'm curious now, was the problem that the string had 2 different namespaces defined and hence was ignoring the second one? – Angel Feb 22 '19 at 19:59
  • 1
    Added additional explanations answering your question in comment - @Angel – T.M. Mar 03 '19 at 17:52