1

I have written this sub that recursively parses a tree of HTML pages. My problem is that when I get to the bottom of the tree and the sub exits to the previous level, all objects are empty. Here is the full code:

Public Row As Integer
Public objIE As Object

Sub GetBranches()
    Set objIE = CreateObject("InternetExplorer.Application") ' create new browser

    objIE.Visible = True ' make IE visible for viewing login

    Row = 2 ' start at line 2 (first line is headers)
    GetBranchesRecursively ("241653")
End Sub

Sub GetBranchesRecursively(BranchID As String)

    Dim CurrentBranch As Object
    Dim subBranches, subBranch As Object
    Dim SubBranchID As String

    ' navigate IE to Branch ID
    objIE.navigate "https://casadasereia.net/vbatests/viewtree" & BranchID & ".html"

    ' wait for browser
    Do While objIE.Busy = True Or objIE.readyState <> 4
        DoEvents
    Loop

    Set CurrentBranch = objIE.document.getElementById(BranchID)

    ' store name and ID of current Branch
    Worksheets("Data").Range("A" + CStr(Row)) = Trim(CurrentBranch.innerText) ' Branch name is in the HTML A tag
    Worksheets("Data").Range("B" + CStr(Row)) = BranchID
    Row = Row + 1

    ' list of subBranches is the list of LI elements of the UL element adjacent to the A tag
    Set subBranches = CurrentBranch.NextSibling.NextSibling.getElementsByTagName("li")

    For Each subBranch In subBranches
        SubBranchID = subBranch.getElementsByTagName("A")(0).ID ' BranchID is in the id property of the HTML A tag
        If InStr(subBranch.className, "node") <> 0 Then
            ' This is a node, store data and move to next in the list
            Worksheets("Data").Range("A" + CStr(Row)) = Trim(subBranch.innerText) ' name is in the HTML A tag
            Worksheets("Data").Range("B" + CStr(Row)) = SubBranchID
            Row = Row + 1
        Else
            ' This is a branch, move down to the branch
            GetBranchesRecursively (SubBranchID)
        End If
    Next
End Sub

At the last level of recursion (BranchID 99816) there are 4 nodes and no branches. Their data is correctly stored and, at the end of the For loop, control is passed back to the previous level but when that happens all objects are empty. This is what the debugger shows me:

enter image description here

Anyone knows why these variables show as empty when they were properly filled before the recursive call?

Fabricio
  • 839
  • 9
  • 17
  • @QHarr I did that already. The Watches panel shows you the current instance of the object. Just before hitting F8 on the lowest level "End Sub" the object has 4 items (for the 4 nodes I mentioned) and right after hitting F8 control returns to the "End If" line just after the previous call to GetBranchesRecursively and all objects being watched show the "". Unless there is a really nasty bug or I'm doing something wrong, they should return to the state they were when the sub was called recursively. – Fabricio Jul 10 '18 at 09:02
  • @QHarr, Sorry, this is an internal application so no easy way to share a live scenario. :-/ – Fabricio Jul 10 '18 at 09:04
  • @QHarr, I was trying to create a public test environment and hit another issue. Care to take a look? https://stackoverflow.com/questions/51263106/vba-internet-explorer-automation-error – Fabricio Jul 10 '18 at 10:39
  • @QHarr, yes, it did help. As I said there, I have no idea why but it helped. Meanwhile I changed this question to include a public URL and also posted the full code of my test. – Fabricio Jul 10 '18 at 12:11
  • @Harr, I edited the whole question and the code now there is the full code of my test and it contains a public URL. The only difference is in the real application the next branch is a HTTP parameter while in my example I created two files which mimic the expanding of the tree and the branch ID is part of the filename. But it replicates the issue. – Fabricio Jul 10 '18 at 12:19
  • Right, the root of the test tree is build by the recursive sub with the passed parameter... :-) But here it is: https://casadasereia.net/vbatests/viewtree241653.html and the next level is also build using the first level HTML tag ID of the branch line. – Fabricio Jul 10 '18 at 12:23
  • Sorry to make you explain so much! But thank you for answering. – QHarr Jul 10 '18 at 12:24
  • No, no. My apologies. Recursion is already complicated enough so I understand making it as simple as possible is a must. Having that in mind: the second level URL corresponding to the first expandable line is: https://casadasereia.net/vbatests/viewtree99816.html – Fabricio Jul 10 '18 at 12:27
  • I will put some time aside tonight to test if not already solved. – QHarr Jul 10 '18 at 12:29
  • When you use the same IE Object to navigate around, then of course all object references belonging to a document that IE is no longer displaying will be gone when you return from a recursive step. – Tomalak Jul 10 '18 at 13:17
  • Thanks Tomalak. I had never stopped to think but indeed, as per the MS VBA reference (https://msdn.microsoft.com/en-us/vba/language-reference-vba/articles/set-statement) the Set statement "Assigns an object reference to a variable or property." It does not make a copy of the object. Time to rethink my strategy but I anticipate performance issues because creating an IE instance at every level will take some time... :-/ Thanks again. – Fabricio Jul 10 '18 at 13:36
  • You don't need to create a new IE at every level, that would indeed be wasteful. In fact, I'm not sure if you need an IE object *at all*. Have you already tried using a `WinHttp.WinHttpRequest` object (part of the "Microsoft WinHTTP Services" type library) to do your requests? You can combine that with `MSHTML.HTMLDocument` from the "Microsoft HTML Object Library" toi get all the features of a DOM document without the weight of IE. You can add both through the VBA "References" dialog. – Tomalak Jul 10 '18 at 14:03
  • @Tomalak, thak you for the hint. I don't know the WinHttpRequest object. The example in the question was a bare bones example just to address the problem you solved me but the real situation is far more complex with over 3K branches to browse. Can WinHttpRequest be used to access a site protected with a dedicated Single Sign On server? This is, when I access a given URL on server A I will get redirected to an SSO auth server B and there I get the auth form. After login I then get redirected back to the original URL on server A, which meanwhile has talked to the SSO server to clear my access. – Fabricio Jul 10 '18 at 14:58
  • WinHttpRequest can do anything, but is not smart. All it can do is send an HTTP request and give you the server's response. Remembering logon cookies, setting authentication headers, doing HTTP redirects you'd have to do yourself. Depending on how complex the SSO scheme is it can be tedious to implement it from scratch, but it's certainly doable. But you can work with a hybrid approach, too: Use the IE object for browsing, but immediately copy the current page body into a new `MSHTML.HTMLDocument`, which will then survive recursion, like `doc.body.innerHTML = objIE.document.body.innerHTML`. – Tomalak Jul 10 '18 at 15:06
  • The mixed approach would work if the full tree was loaded from the start which it isn't... :-) So, for now I will stick to the IE version with an application started and then quit at every recursion. Slow but will work. If you want to answer the question with your first comment I will accept it. – Fabricio Jul 10 '18 at 15:46
  • Thinking about it, you could try working with the "back" button. Every time you step out of the recursion, you go back in the browser history once. You would have to select the elements again, but at least they would be there. – Tomalak Jul 12 '18 at 11:53

0 Answers0