0

I hope this question is not in other post since I have searched and not found an answer. I'm also quite new to programming but specially to scraping the web. If you guys know of any good, complete tutorial, I'll appreciate if you can direct me to it. I work with VBA and Python.

I begun working after reading this: Scraping data from website using vba

Very helpful, by the way. I understood the old method better so I chose that one.

The site I want to search in is: http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp

The code I've written so far:

Sub Test()

    Dim ie As Object
    Dim form As Variant, button As Variant
    Set ie = CreateObject("InternetExplorer.Application")
    Dim TR_col As Object, TR As Object
    Dim TD_col As Object, TD As Object
    Dim xx As Object, x As Object
    

    With ie
    .Visible = True '< Show browser window
    .navigate ("http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp") '> Travel to homepage
     
    Do While ie.Busy
        DoEvents
    Loop '< Wait for page to have loaded
     
     
    End With
       
    Set TR_col = ie.Document.getElementsByTagName("TR")
     
    For Each TR In TR_col
        Set xx = ie.Document.getElementsByTagName("a")
        If xx = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
            Cells(1, 1) = "Ok"
        End If
        
    Next TR
End Sub

Lastly, this is the Inspector looks like:

[![enter image description here][1]][1] [1]: https://i.stack.imgur.com/YoG4H.png

I also highlighted the piece of information I'm using for testing purposes.

So, my approach is to search for all the "tr" tags and then validate whether the first column of the table (I guess this would be the first "td" tag) is equal to a text I'll have in a cell (in this case I just wrote in text for testing purposes). The result should be copying the number next to the date to a cell in the worksheet. In this case I wrote "Ok" just to see whether the if statement was working. But it isn't.

I guess I'm not sure how tell VBA to search for all "tr" tags, search for all the "td" tags within each "tr", find the one that matches some text, and return the 3rd "td" tag within that "tr". Makes sense?

Hope I've been specific enough and that someone can guide me.

teoeme139
  • 412
  • 3
  • 11

1 Answers1

0

It's not necessary to load whole browser to get HTML - you can do without it.

Sub Test()

    '// References required:
    '// 1) Microsoft HTML Object Library
    '// 2) Microsoft XML, v6.0

    Dim req As MSXML2.XMLHTTP60
    Dim doc As MSHTML.HTMLDocument
    Dim tbl As MSHTML.HTMLTable
    Dim tblRow As MSHTML.HTMLTableRow
    Dim tblCell As MSHTML.HTMLTableCell
    Dim anch As MSHTML.HTMLAnchorElement
    Dim html$, url$, sText$, fecha$, valor$, j%

    Set req = New MSXML2.XMLHTTP60
    url = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp"
    Set req = New MSXML2.XMLHTTP60
    req.Open "GET", url, False
    req.send
    html = req.responseText

    Set doc = New MSHTML.HTMLDocument
    doc.body.innerHTML = html

    Set tbl = doc.getElementsByClassName("table-BCRA")(, 0)
    For j = 1 To tbl.Rows.Length - 1
        With tbl.Rows(j)
            '// Skip cells without data.
            '// Assume correct data has three cells.
            If .Cells.Length = 3 Then
                Set anch = .Cells(0)
                sText = anch.textContent
                If sText = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
                    fecha = .Cells(1).innerText
                    valor = .Cells(2).innerText
                End If
            End If
        End With
    Next

End Sub
JohnyL
  • 6,894
  • 3
  • 22
  • 41
  • Thank you, JohnyL. For some reason it didn't work although it didn't throw any errors. I suspect there may be some issues with the class name or the sText variable value. I will work on tweaking your code. Thanks again – teoeme139 Jan 18 '19 at 14:51
  • @teoeme139 I tried it and it worked. The problem is that there's text with ` ` characters - and VBA is stubborn and doesn't wanna delete it! – JohnyL Jan 18 '19 at 15:31