0

I'm trying to automate a task in excel that requires opening a webpage, navigating to a link on that page, and then clicking on a button on the second page to download an .xlsx file.

I've written a script that should do this. However, the response I get from the webpage is not always the same. In particular, sometimes this will return a download from the first page and sometimes it will navigate to the second page and not download anything, once or twice it has done both.

My sense is that this has to do with how long it takes for InternetExplorer.application to complete a request. I can't figure out how to troubleshoot this though, given that I tell the script to wait for IE.application to complete its request.

Sub DoBrowse2()

    'For Each lnk In Sheets("Sheet4").Hyperlinks
        'Range(lnk).Hy.Follow
        'Next

    Dim i As Long
    Dim URL As String
    Dim BaseURL As String
    Dim ToURL As String
    Dim IE As Object
    Dim objElement As Object
    Dim objCollection As Object
    Dim HWNDSrc As Long
    Dim html As IHTMLDocument

    Set IE = CreateObject("InternetExplorer.Application")

    URL = Range("B2").Hyperlinks(1).Address

    IE.Navigate URL

    IE.Visible = True

    Application.StatusBar = URL & " is loading. Please wait..."

    Do While IE.ReadyState = 4: DoEvents: Loop
    Do Until IE.ReadyState = 4: DoEvents: Loop

    Application.StatusBar = URL & " Loaded"

    'Set html = IE.Document
    'Dim elements As IHTMLElementCollection
    'Set elements = html.all

    For Each itm In IE.Document.all
        If itm.className = "datagrid" Then
            For Each el In itm.Document.all
                Debug.Print "hello"
                If el.className = "ujump" And Right(el.innerText, 12) = "Constituents" Then
                    'Debug.Print el.innerText
                    ToURL = el.getAttribute("data-subset")
                    BaseURL = "http://datastream.thomsonreuters.com/navigator/search.aspx?dsid=ZUCH002&AppGroup=DSAddin&host=Metadata&prev=scmTELCMBR&s=D&subset="
                    ToURL = BaseURL & ToURL
                    'Debug.Print ToURL

                    IE.Navigate ToURL
                    IE.Visible = True

                    Do While IE.Busy
                        Debug.Print "in busy loop"
                        Application.Wait DateAdd("s", 1, Now)
                    Loop

                    GoTo end_of_for
                End If
            Next
        End If
    Next

end_of_for:

    Debug.Print ("STOP STOP STOP STOP STOP")

    Dim Script As String

    For Each itm In IE.Document.all
        If itm.className = "lgc excel" Then
            Debug.Print "hello world"
            Debug.Print itm.getAttribute("onclick")
            itm.Click

            Do While IE.Busy
                Debug.Print "app busy"
                Application.Wait DateAdd("s", 1, Now)
            Loop

            Exit For

        End If
    Next

End Sub

Thanks in advance for your help.

Community
  • 1
  • 1
wmcass
  • 1
  • 3
  • Why not to examine the request from the second page with browser developer tools, and make the same XHR to load the content and save it to file? – omegastripes Sep 30 '16 at 19:15
  • Thanks for your reply. I am not very familiar with web scraping - or vba for that matter. Could you be a little more explicit? – wmcass Sep 30 '16 at 19:32
  • [Here is the example](http://stackoverflow.com/a/32429348/2165759) showing how to download a file via XHR, could you please provide the first page URL for more specific example? – omegastripes Oct 01 '16 at 12:40
  • You will not have access to the url if you do not have a subscription. Thanks for the example. I'll take a look and see if I can figure out from there. – wmcass Oct 02 '16 at 18:28
  • [Another one example](http://stackoverflow.com/a/33484763/2165759). – omegastripes Oct 03 '16 at 19:46

1 Answers1

0

Use this to determine whether IE page has been fully loaded, it always must be both of these conditions:

Do Until ie.ReadyState = 4 And ie.Busy = False
    DoEvents
Loop

Even with code above if there are scripts on the page, some content may be loaded after ie.ReadyState = 4 And ie.Busy = False condition is met and either easy way, but inefficient and unreliable Application.Wait can be used or you can try finding elements on the website which inform about loading state and determine the state by their visible attributes etc.

Part of your code is wrong and causes an endless loop:

Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop

It makes DoEvents fire while readystate is complete and also until it reaches complete status.

Narrow down a collection of all elements:

For Each itm In IE.Document.all

to a specific collection for better performance when possible, for example:

For Each itm In IE.Document.GetElementsByTagName("div")
Ryszard Jędraszyk
  • 2,296
  • 4
  • 23
  • 52
  • The solution may not be reliable enough for some cases, and should be extended like [this](http://stackoverflow.com/a/23232573/2165759) – omegastripes Oct 03 '16 at 21:51