6

Situation:

I am attempting to inspect a variable a, showing as a DispStaticNodeList object in the locals window; Excel crashes every time I try to do this.

Here is the variable a, apparently of type DispStaticNodeList, in the locals window:

Locals window

Reproducing the Excel crash:

  1. Attempting to expand the item in the Locals Window causes Excel to crash.
  2. Attempting to loop over it with a For Each also causes a crash.*TestFail

Research highlights:

  1. I did some digging around, searching for combinations such as Excel + Crash + DispStaticNodeList yielded zero results; At least with the Google search terms I used. Pretty sure my Google-Fu was weak.
  2. If I believe this article I am dealing with a COM object that is supported by MSHTML.

  3. And according to this:

If the name is DispStaticNodeList, we can be pretty sure it's an array..(or at least has array semantics).

Based on point 3 I wrote the code, TestPass, below, which does loop over it successfully, but I don't fully understand why. I have set an object and then looped its len!

  1. I have just found this which states:

NodeList objects are collections of nodes such as those returned by properties such as Node.childNodes and the document.querySelectorAll() method.

So it seems the object may be a NodeList, which given the description in the immediate window seems about right, and as a list I can loop over its length, but not sure why For Each won't work and why Excel crashes. Colleague suggests it may crash due to hierarchical nature of data. I further note that there are classes called IDOMNodeIterator and NodeIterator, but I am not sure if I can use those in line with the descriptions for NodeList methods here.

Question:

What is a and why does it cause Excel to crash when attempting to inspect or loop with a For Each?

Code that loops successfully:

Option Explicit

Public Sub TestPass()
    Dim html As HTMLDocument
    Set html = GetTestHTML
    Dim a As Object, b As Object

    Set a = html.querySelectorAll("div.intro p")

    Dim i As Long

    For i = 0 To Len(a) -1
        On Error Resume Next
        Debug.Print a(i).innerText    '<== HTMLParaElement
        On Error GoTo 0
    Next i
End Sub

Public Function GetTestHTML(Optional ByVal url As String = "https://www.w3schools.com/cssref/trysel.asp") As HTMLDocument
    Dim http As New XMLHTTP60
    Dim html As New HTMLDocument
    With http                                    'Set http = CreateObject("MSXML2.XMLHttp60")
        .Open "GET", url, False
        .send
        html.body.innerHTML = .responseText
        Set GetTestHTML = html
    End With
End Function

*TestFail Code that causes crash:

Public Sub TestFail()
    Dim html As HTMLDocument
    Set html = GetTestHTML
    Dim a As Object, b As Object

    Set a = html.querySelectorAll("div.intro p")

    For Each b In a

    Next b
End Sub

Notes:

I sent a test workbook to a colleague who was also able to reproduce this behaviour with the example given.

Project References:

Project references

Sample of HTML (link was also provided)

<div class="noSel">
<h1 style=""><span class="markup">&lt;h1&gt;</span>Welcome to My Homepage<span class="markup">&lt;/h1&gt;</span></h1>

<div id="helpIntro" style="">
<span class="markup">&lt;div class="intro"&gt;</span>
<div class="intro">
<p style="margin-top: 4px; border-color: rgb(255, 102, 102); background-color: rgb(255, 255, 153);"><span class="markup">&lt;p&gt;</span>My name is Donald <span id="Lastname" style=""><span class="markup">&lt;span id="Lastname"&gt;</span>Duck.<span class="markup">&lt;/span&gt;</span></span><span class="markup">&lt;/p&gt;</span></p>
<p id="my-Address" style="border-color: rgb(255, 102, 102); background-color: rgb(255, 255, 153);"><span class="markup">&lt;p id="my-Address"&gt;</span>I live in Duckburg<span class="markup">&lt;/p&gt;</span></p>
<p style="margin-bottom: 4px; border-color: rgb(255, 102, 102); background-color: rgb(255, 255, 153);"><span class="markup">&lt;p&gt;</span>I have many friends:<span class="markup">&lt;/p&gt;</span></p>
</div>
<span class="markup">&lt;/div&gt;</span>
</div>

<br>
<div class="helpUl">
<span class="markup">&lt;ul id="Listfriends&gt;</span>
<ul id="Listfriends" style="margin-top:0px;margin-bottom:0px;">

<li><span class="markup">&lt;li&gt;</span>Goofy<span class="markup">&lt;/li&gt;</span></li>
<li><span class="markup">&lt;li&gt;</span>Mickey<span class="markup">&lt;/li&gt;</span></li>
<li><span class="markup">&lt;li&gt;</span>Daisy<span class="markup">&lt;/li&gt;</span></li>
<li><span class="markup">&lt;li&gt;</span>Pluto<span class="markup">&lt;/li&gt;</span></li>
</ul>       
<span class="markup">&lt;/ul&gt;</span>
</div>

<ul style="display:none;"></ul>
<p style=""><span class="markup">&lt;p&gt;</span>All my friends are great!<span class="markup">&lt;br&gt;</span><br>But I really like Daisy!!<span class="markup">&lt;/p&gt;</span></p>

<p lang="it" title="Hello beautiful" style=""><span class="markup">&lt;p lang="it" title="Hello beautiful"&gt;</span>Ciao bella<span class="markup">&lt;/p&gt;</span></p>

Edit: I have also been able to loop in the following fashion:

Public Sub Test()
    Dim html As MSHTML.HTMLDocument, i As Long
    Set html = GetTestHTML

    For i = 0 To html.querySelectorAll("div.intro p").Length - 1
       Debug.Print html.querySelectorAll("div.intro p")(i).innerText
    Next i

End Sub
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • @MathieuGuindon Thanks for responding. Which library are you referring to please? Colleague wonders if it is do to with hierarchical data as somehow an issue. – QHarr Apr 25 '18 at 12:11
  • 1
    I'm guessing it would be the HTML library. Whichever defines this `DispStaticNodeList` type ;-) – Mathieu Guindon Apr 25 '18 at 12:13
  • Stupid me. Sorry. Gotcha. – QHarr Apr 25 '18 at 12:14
  • 1
    This is not related to the question but an (unsolicited) advice -- always use two-part naming for your types. `Dim html As MSHTML.HTMLDocument` makes it much clearer where you are taking the types from and avoid ambiguity/uncertainty over which object you are taking from. Similarly, it _generally_ is the case that using strong-typed variable may avoid certain bugs w/ dispatch calls. (there are also cases where you avoid other bugs by using dispatch but IME, earlybinding is less likely to have bugs than latebinding) – this Apr 25 '18 at 12:58
  • @this Thank you. I have got lazy it seems :-( Does putting Dim html As MSHTML.HTMLDocument make it strongly typed? – QHarr Apr 25 '18 at 12:59
  • No more so than just `Dim html As HTMLDocument`. The primary benefit is the clarity of the code and disambiguation. The strong typing issue is specific to the `a` and `b` variables which were declared as just `Object`, rather than the actual data type they ought to be. – this Apr 25 '18 at 13:03
  • Ah... so yes was my laziness. I declare as object normally first time round as I don't always know what object to expect back. I know.... lot to learn.....Then I switch to the appropriate type after inspecting in the locals or TypeName call. I will endeavour to be more mindful of this in future. – QHarr Apr 25 '18 at 13:05
  • 1
    @QHarr - Donald indeed has many friends, at least 96 that [we know of](https://en.wikipedia.org/wiki/List_of_Donald_Duck_universe_characters), however I believe you're closest to Donald's Italian residence of Quack Town ([source](https://en.wikipedia.org/wiki/Donald_Duck#Italy)) although he does claim dual citizenship at Duckberg, Calisota, USA. ([map](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d7/Map_of_Duckburg.svg/612px-Map_of_Duckburg.svg.png)). Oh sorry - maybe that wasn't the **primary** topic of your question? – ashleedawg May 14 '18 at 06:47

1 Answers1

3

If the name is DispStaticNodeList, we can be pretty sure it's an array..(or at least has array semantics).

Arrays can normally be iterated with a For Each loop, however it's more efficient to iterate them using a For loop. Looks like what you're getting isn't exactly an array, and while it appears to support indexing, it apparently doesn't support enumeration, which would explain the blowing up when you attempt to enumerate it with a For Each loop.

Looks like the locals toolwindow might be using For Each semantics to list the items in the collection.

I'm not familiar with that particular library so this is a bit of (educated) guesswork, but it's pretty easy to make a custom COM collection type that can't be iterated with a For Each loop in VBA - normally the error is caught on the VBA side though... Seems there might be a bug in the library's enumerator implementation (assuming there's an enumerator for it) causing it to throw an exception that ends up unhandled and somehow takes everything down with it... thing is, you can't fix & recompile that library... so the only thing you can do is to avoid iterating that type with a For Each loop, and avoid expanding it in the locals toolwindow (and so, ...save your work often!).

This article gives a good idea from a C#/.NET perspective, of how COM enumeration works. Of course that library isn't managed code (.NET), but the COM concepts at play are the same.

TL;DR: it's not because you can For...Next that you can For Each; the COM type involved must explicitly support enumeration. If the VBA code compiles with a For Each loop, then it does, so it must be a bug in the type's enumerator.

Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
  • 2
    I'd add that I know from experience that `MSHTML` is sort of dodgy in its implementation - it's made more for JS than for VBA in a manner of speaking. – this Apr 25 '18 at 12:55
  • There was also this article by mr meaden http://exceldevelopmentplatform.blogspot.co.uk/2018/01/vba-webscraping-jquery-selectors.html – QHarr Apr 25 '18 at 13:16