2

In order to fix the following code, I tried to split it up into a smaller part. So, I have the following code that drives me crazy for hours in Sheet1:

    Sub Scrapping_Data()
    Dim IE As Object, EURUSD1 As String, EURUSD2 As String
    Application.ScreenUpdating = False
    Range("A:B").Clear

    Set IE = CreateObject("internetexplorer.application")

    With IE
       .Navigate "http://uk.investing.com/currencies/streaming-forex-rates-majors"
       .Visible = False
    End With    

    Do
        DoEvents
    Loop Until IE.readyState = READYSTATE_COMPLETE

    Set FOREX = IE.document.getElementById("pair_1")
    EURUSD1 = FOREX.Cells(1).innerHTML
    EURUSD2 = FOREX.Cells(2).innerHTML
    IE.Quit
    Set IE = Nothing

    Range("A1").Value = EURUSD1
    Range("B1").Value = EURUSD2
    End Sub

I run it for the first time and it worked fine. But when I run it for the second time, the error the run-time error '91' occurred. So I clicked F8, but nothing happened the code worked fine and I checked Sheet1 there were values in Cells(1,1) and Cells(1,2). I then run it again and the error the run-time error '13' occurred this time. Again I clicked F8, but nothing happened the code worked fine. When I kept running the code, the errors still occurred and clicking F8 didn't help to find the problem. What is wrong with my code? How to fix it?

What I don't get it here too is my laptop is getting slow every time I run the code and I have to manually restart it many times.

Community
  • 1
  • 1
  • Why is this tagged [javascript]? –  Jul 11 '16 at 05:47
  • @Jeeped Hhmmm, because of `getElementById`? – Anastasiya-Romanova 秀 Jul 11 '16 at 06:06
  • In VBA this is called the DOM or Document Object Model. See [getElementById method](https://msdn.microsoft.com/en-us/library/ms536437.aspx). While there is a new(er) venture in [office-js](http://stackoverflow.com/questions/tagged/office-js), Microsoft and Javascript usually don't see eye-to-eye and that is certainly the case here. –  Jul 11 '16 at 06:11

2 Answers2

2

The following requires that you go into the VBE's Tools ► References and place checkmarks beside Microsoft HTML Object library and Microsoft XML v6.0.

This is an equivalent of an Internet Explorer object web scrape to the same URL.

Option Explicit

Sub tournamentFixtures()
    'declare the objects with early binding
    Dim htmlBDY As New HTMLDocument, xmlHTTP As New MSXML2.XMLHTTP60
    'declare the regular variables
    Dim sURL As String, ws As Worksheet

    'set a var object to the destination worksheet
    Set ws = Worksheets("Sheet1")

    'assign the URL to a string var
    sURL = "http://uk.investing.com/currencies/streaming-forex-rates-majors"

    'isolate all commands to the MSXML2.XMLHTTP60 object
    With xmlHTTP
        'initiate the URL
        .Open "GET", sURL, False
        'set hidden header information
        .setRequestHeader "User-Agent", "XMLHTTP/1.0"
        'get the page data
        .send

        'safety check to make sure we got the web page's data
        If .Status <> 200 Then GoTo bm_safe_Exit

        'if here you got the page data - copy it to the local var
        htmlBDY.body.innerHTML = .responseText
    End With

    'localize all commands to the page data
    With htmlBDY
        'check if the element ID exists
        If Not .getElementById("pair_1") Is Nothing Then
            'it exists - get the data directly to the worksheet
            With .getElementById("pair_1")
                ws.Range("A1") = .Cells(1).innerText
                ws.Range("B1") = .Cells(2).innerText
            End With
        Else
            'it doesn't exist - bad page data
            MsgBox "there is no 'pair_1' on this page"
        End If

    End With

bm_safe_Exit:
    'clean up all of the objects that were instantiated
    Set htmlBDY = Nothing: Set xmlHTTP = Nothing: Set ws = Nothing
End Sub

I have commented virtually every line so you can follow what is happening. This may need some tweaking. I ran it ~40 times and it failed once but that could have been my own Internet connection. Consider this a starting point where you can do your own research to accomplish your goals. If you continue to have problems with this new code, please do not paste this into another question and ask why it doesn't work without doing some research and attempting a solution yourself. StackOverflow is a site for professional and enthusiast programmers.


I gave up trying to offer solutions to web scraping problems because page technology changes too fast to keep up on a peripheral level. You have to be involved in the immediate changes to be able to respond to them quickly and my own interests lie elsewhere. I responded to this request because you actually supplied the URL to test against (something few people asking questions actually think is important - go figure) and I thought the static dimming of the var would help.

Community
  • 1
  • 1
1

The construction and destruction of an InternetExplorer object takes time; up to a few seconds even on the fastest sytems. You can wait an appropriate amount of time for it to relinquish all of the .DLLs et al it has loaded or you can declare your IE as a static object that will be reused on subsequent reruns of the sub procedure.

Option Explicit

Sub Scrapping_Data()
    Static IE As Object
    Dim EURUSD1 As String, EURUSD2 As String

    Application.ScreenUpdating = False
    With Worksheets("Sheet1")   'KNOW what worksheet you are on!!!!!
        .Range("A:B").Clear
    End With

    If IE Is Nothing Then
        Set IE = CreateObject("internetexplorer.application")
        With IE
            .Visible = True
            '.Visible = False
            .Silent = True
        End With
    End If

    With IE
       .Navigate "http://uk.investing.com/currencies/streaming-forex-rates-majors"
       Do While .ReadyState <> 4: DoEvents: Loop
       With .document.getElementById("pair_1")
            EURUSD1 = .Cells(1).innerHTML
            EURUSD2 = .Cells(2).innerHTML
       End With
    End With

    With Worksheets("Sheet1")   'KNOW what worksheet you are on!!!!!
        .Range("A1") = EURUSD1
        .Range("B1") = EURUSD2
    End With

    IE.Navigate "about:blank"

End Sub

The caveat here is that you will have to destruct the InternetExplorer object yourself at some point in the future. Closing the workbook will close the VBA project but leave the IE object 'orphaned'.

Given all of the HTML5 debris that comes along with that web page, have you considered moving to ? And if you are wondering then yes, that would be a new question under a different set of [tags].

  • I have a few questions. Must I write `With Worksheets("Sheet1")` given that my code lies on Sheet1? – Anastasiya-Romanova 秀 Jul 11 '16 at 06:09
  • Your code lies in a VBA project, not on Sheet1. While *not* supplying a parent worksheet will default to the [ActiveSheet property](https://msdn.microsoft.com/en-us/library/office/ff822753.aspx), it is generally **not** considered a 'best practise' to default to the ActiveSheet. Best to start good coding now rather than try to repair bad habits later. –  Jul 11 '16 at 06:15
  • Does the same problem also occur in [this code](http://stackoverflow.com/q/38172005/3397819)? Could you also answer that question too if you don't mind? Thanks – Anastasiya-Romanova 秀 Jul 11 '16 at 06:21
  • 2
    @Anastasiya-Romanova秀 - when your code is in a worksheet code module then the default worksheet for Range/Cells etc is that worksheet. Or you can use `Me.Range()` etc if you want to be specific. https://stackoverflow.com/questions/28439376/what-is-the-default-scope-of-worksheets-and-cells-and-range/28439984#28439984 – Tim Williams Jul 11 '16 at 06:24
  • Please do not ask me if 'the same problem occur in this code'. That just tells me that you are asking questions without properly testing the solution I've suggested. IE is bloated at the best of of times. It is best to reuse it whenever possible. You might try converting this to a 'helper' sub that takes a parameter and destructs the IE object once you are done. However, I still recommend shifting to an xlmhttp object. –  Jul 11 '16 at 06:25
  • Good catch @TimWilliams - I hadn't considered the code would be in a worksheet code sheet rather than a module code sheet. –  Jul 11 '16 at 06:26
  • @Jeeped I did try your code and it's still running after I run it for 10 minutes. Seriously, what happen here? FYI, I'm still new in this thing so I don't understand what does xlmhttp object mean – Anastasiya-Romanova 秀 Jul 11 '16 at 06:29
  • OK, we seem to have some [ESL](http://k6educators.about.com/od/educationglossary/g/esl.htm) concerns that need to be considered. tbh, I do not know whether your latest report is a good thing or a bad thing. Is it running for 10 minutes on its first try to get the web page or has it been running for 10 minutes getting multiple pages successfully? –  Jul 11 '16 at 06:34
  • @Jeeped it's running for 10 minutes on its first try. It's still running btw – Anastasiya-Romanova 秀 Jul 11 '16 at 06:37
  • An [xmlhttp] object is like an invisible web browser that does not collect all of the garbage that a web browser collects. There are no popups, no redirects, just raw data that you can parse and put into your worksheet. That is a simplistic explanation but a valid one. The web page you are going to has HTML5 over-screens and javascript junk that are between you and the data you are trying to retrieve. An [xmlhttp] object 'ignores' all of this. –  Jul 11 '16 at 06:40
  • That is probably due to the HTML5 overscreen that is stopping the page from completing until it is acknowledged. Go to the URL manually to see what I mean. –  Jul 11 '16 at 06:43
  • btw, if you have been testing your routine and mine repeatedly on the same computer, go to the Task manager and start killing the iexplore.exe processes listed there. –  Jul 11 '16 at 06:46
  • @Jeeped I follow your suggestion. I go to the URL manually, I get the following message: `You are using an unsupported version of Internet Explorer: We no longer support Internet Explorer 7, 8 & 9. Some features may stop working. To enjoy all our site's features, please upgrade your browser.` – Anastasiya-Romanova 秀 Jul 11 '16 at 06:47
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/116960/discussion-between-jeeped-and-anastasiya-romanova-). –  Jul 11 '16 at 06:51