1

First web scraping project!

I've been copying various web scraping code from here but can't get around a

run time error 13: Type Mismatch

on the .document.getElementById("") line I'm using to set a variable for a hyperlink I want to click. I figured it should be treated like the log-in button that I successfully coded. I'm not sure if I am missing a library I should be using, as pretty much every other post had different issues and solutions than what I'm running into. What am I doing wrong here?

I'm using IE11 and Excel 2010. I started adding libraries I thought might provide a solution. The libraries I've activated are as follows:

  • Visual Basic For Applications
  • Microsoft Excel 14.0 Object Library
  • OLE Automation
  • Microsoft Office 14.0 Object Library
  • Microsoft HTML Object Library
  • Microsoft Internet Controls
  • Microsoft XML, v6.0
  • Microsoft Shell Controls And Automation

Here is the code and HTML DOM snippet:

Sub IEScrape()
  'we define the essential variables
 Dim ie As Object
 Dim pwd, username
 Dim button
 Dim MemAss

'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
 Set ie = New InternetExplorerMedium
 With ie
     .Visible = True
     .navigate ("internalwebsite.com")
     While ie.readyState <> 4
         DoEvents
     Wend

     Set username = .document.getElementById("userid") 'id of the username control (HTML Control)
     Set pwd = .document.getElementById("password") 'id of the password control (HTML Control)
     Set button = .document.getElementById("loginbtn") 'id of the button control (HTML Control)
     username.Value = "username"
     pwd.Value = "password"
     button.Click
     While ie.readyState <> 4
         DoEvents
     Wend

    'Run time error 13: Type mismatch on next line!!!
    Set MemAss = .document.getElementById("Menu:membershipassociation") 'id of the link (HTML Control)
    MemAss.Click
    While ie.readyState <> 4
         DoEvents
    Wend

 End With



 Set ie = Nothing
 End Sub

td element info

Community
  • 1
  • 1
plankton
  • 369
  • 5
  • 21

2 Answers2

2

As you have mentioned that pausing the code for 5 seconds allows the code to function I would assume that there is something happening asynchronously to the HTML loading either a AJAX request or JavaScript editing the DOM.

This would mean that once the HTML has loaded (Readystate = 4) the JavaScript could still be running or we could still be waiting for the AJAX response.

Waiting the code as you have would allow internet explorer to finish all of its tasks before VBA can pick up the references. Although the drawback is that you are waiting for an arbitrary amount of time and there is a change it will not load in this interval.

In order to build a more robust control (if needed) I would suggest loading the webpage outside of VBA and using the browsers debugger menu add a breakpoints on any DOM changes, then wait until you can see when your "Menu:membershipassociation" being defined. I would then pay attention to what process called this and see how you can tie your script in. An ideal outcome would be if this data is stored in the page when it loads or in another location you can reach your VBA to directly.

Although when I have hit this roadblock in the past I have used an iterator to have a go at regular intervals which may be able to speed up your code at this section. I also like to use these iterators on any of the DOM that I am not 100% sure is available immediately. essentially just try to load the code every second or 0.5 seconds until it has loaded.

The other suggestion that I would have if when you debug the webpage in a browser, if the data is available when page is loaded then the issue could be due to the fact that you are trying to call the click method straight away. You could try using the IE_DocumentComplete event to signal this is available. An Example has been posted Here which may be of help.

If you are able to update us with what you find from debugging the page as it loads we can point you in a better direction to solve the issue.

user2502611
  • 147
  • 7
0

I have no idea why this works, but I paused the process for 5 seconds and all of a sudden, it recognizes .Document.getElementById("Menu:membershipassociation").Click. If anyone has any critiques on my process, you can post an answer with the better code and I'll mark it correct.

Code Below:

Option Explicit

Sub IEScrape()
  'we define the essential variables
 Dim ie As Object
 Dim pwd, username
 Dim button
 Dim MemAss

'add the "Microsoft Internet Controls" reference in your VBA Project indirectly
 Set ie = New InternetExplorerMedium
 With ie
     .Visible = True
     .Navigate ("internalwebsite.com")
     While ie.ReadyState <> 4
         DoEvents
     Wend

     Set username = .Document.getElementById("userid") 'id of the username control (HTML Control)
     Set pwd = .Document.getElementById("password") 'id of the password control (HTML Control)
     Set button = .Document.getElementById("loginbtn") 'id of the button control (HTML Control)
     username.Value = "username"
     pwd.Value = "password"
     button.Click
     While ie.ReadyState <> 4
         DoEvents
     Wend

    Application.Wait (Now + TimeValue("0:00:05"))
    .Document.getElementById("Menu:membershipassociation").Click

    While ie.ReadyState <> 4
         DoEvents
    Wend

 End With



 Set ie = Nothing
 End Sub
plankton
  • 369
  • 5
  • 21