0

Someone answered my question for Java and I'm basically building on a previous answer provided by @enderland here.

I'm running a webscraper and generally it works well but I often run into "runtime errors". I want to avoid this by skipping a specific assignment (in my case loading a patent page on Google.) if the time it takes to load the website takes too long.

I reckon a simple If Then is what I need but I don't know which function to use to control the elapsing of time.

Any suggestions?

At the moment I run the following:

Function citecount(patent_number As String, patent As String, ccount As Integer, info As String)

patent = ""
ccount = 0
If patent_number = "" Then Exit Function

the_start:


Set ie = CreateObject("InternetExplorer.Application")

    ie.Top = 0
    ie.Left = 0
    ie.Width = 800
    ie.Height = 600
    ie.Visible = False 'If False we won't see the window navigation

On Error Resume Next
     ie.Navigate ("http://www.google.com/patents/US" & patent_number & "?")
        Sleep (600)
      Do
        DoEvents
            If Err.Number <> 0 Then
                ie.Quit
                Set ie = Nothing
                GoTo the_start:
            End If
           Sleep (1250)

        Loop Until ie.ReadyState = 4

With Sleep() being defined as:

Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

The main error I face is described here. It's a Run-time error '2147467259 (80004005) automation error, unspecified error. Additionally, before I added the Sleep() commands I also got Microsoft Excel is waiting for another Application to complete an on OLE action but that has not come back since adding the Sleep() command.

Finally I get an IE warning:

Stop running this script? A Script on thispage is causing your web browser to run slowly. If it continues to run your computer might become unresponsive

These are I think all caused by the fact that the webpage is taking loads of time downloading images that I do not need. I read some posts about loading a webpage directly in html without images on SO but could not find one that I could implement (novice at work).

Hope this provides clarification

Community
  • 1
  • 1
SJDS
  • 1,239
  • 1
  • 16
  • 31
  • What runtime error(s)? Also, can you post more of your code? Your current `If Err.Number...` block (if that's all there is) is never going to trap an error unless you have an `On Error Resume Next` somewhere preceding it. – David Zemens May 29 '14 at 14:15

1 Answers1

2

I think you might want to rethink how you're using IE/your code to solve your problem in general. If you really just want to use a timeout to skip IE calls you can do the following.

Use this function to get the time elapsed between 2 dates:

Function ElapsedTimeInSeconds(endTime As Date, startTime As Date) As Long
    ' Calculate the time interval in seconds
    If endTime > startTime Then
        ElapsedTimeInSeconds = DateDiff("s", startTime, endTime)
    Else
        ElapsedTimeInSeconds = 0 ' cannot have negative elapsed time...
    End If
End Function

Change your do Loop to take that into account

Dim startTime As Date: startTime = Now
Dim timeout As Long: timeout = 5  'seconds
Do
    DoEvents
    If Err.Number <> 0 Then
        ie.Quit
        Set ie = Nothing
        GoTo the_start:
    End If
    Sleep (1250)
Loop Until ie.ReadyState = 4 Or ElapsedTimeInSeconds(Now, startTime) > timeout

OR you might want to play around with ElapsTimeInSeconds() function to go back to "the _start", depending on how the rest of your code is structured.

rex
  • 3,133
  • 6
  • 35
  • 62
  • Thanks Armen, I implemented your solution which worked quite well, especially once I realized I could block images to show up in IE via some advanced settings. However, I still get the runtime error I described before (http://stackoverflow.com/questions/23818004/error-with-vba-scraping-ie-google-patents-computer-restart-temporary-sol) which basically disables me to run code overnight because it fails every 10 - 15 minutes. You mentioned a different approach to solving my problem. Can you give any leads on how it could work more efficiently and effectively ? – SJDS Jun 04 '14 at 11:46