0

I have some questions reagrding an Excel VBA program that I want to build.

Basically it's pretty easy. I want to access the following website https://coronavirus.jhu.edu/map.html and extract the Confirmed Cases by Country/Region/Sovereignty (it's the table on the very left of the dashborad) and paste the values in excel.

I know all the basic stuff on how to setup an internetexplorer instance and scraping the page by tags, classes, ids etc. But I think in this sceanrio I cannot use the basic things. I guess it's pretty tricky actually. The information I am looking for is within some tags. But I cannot get their textcontent when I use the getelementsbytagname("strong") approach.

Could someone help me in this case?

I am grateful for any hints, advices and solutions.

Below you'll find the start of my code.

Best Simon

 Sub test()
    Dim ie As InternetExplorer
    Dim html As HTMLDocument
    Dim i As Integer
    Dim obj_coll As IHTMLElementCollection
    Dim obj As HTMLObjectElement


    Set ie = New InternetExplorer
    ie.Visible = False


    ie.navigate "https://coronavirus.jhu.edu/map.html"

    Do Until ie.readyState = READYSTATE_COMPLETE
    DoEvents
    Loop

    Debug.Print "Successfully connected with host"
    Set html = ie.document

    Set obj_coll = html.getElementsByTagName("strong")

    For Each obj In obj_coll

    Debug.Print obj.innerText


    Next obj

    ie.Quit
    End Sub
  • 2
    You can not directly access this table in this web page. This table is created under ` – Deepak Mar 31 '20 at 11:03
  • Add a `.busy` check to your `ie` readystate checking too – Nathan_Sav Mar 31 '20 at 11:52
  • @Deepak, thanks for the hint! Do you mean like this: `html.getElementsByTagName("iframe").getElementsByTagName("strong")`. But this throws an error. What would I do next? For what do I search within the iframe tag? – Simon_A_Drax Mar 31 '20 at 14:11
  • @Nathan_Sav, thanks! Could you give the full line? How does it benefit me? – Simon_A_Drax Mar 31 '20 at 14:12
  • @Simon_A_Drax no it won't work like this. as `iframe` element store document rather then elements. It's like having page inside another page. Therefore you will need to convert `iframe` data into document before accessing it's content. – Deepak Mar 31 '20 at 14:19
  • @Deepak. Oh, okay. And how would I do that? – Simon_A_Drax Mar 31 '20 at 14:24
  • create a second variable `Dim html2 As HTMLDocument` store its content to it..I did it long time ago, so did not remember correct code. However you will able to fix it. – Deepak Mar 31 '20 at 14:28
  • @Simon_A_Drax please ref https://stackoverflow.com/questions/44902558/accessing-object-in-iframe-using-vba link – Deepak Mar 31 '20 at 14:59
  • How often do you wish to collect this data? How up to date does it need to be? – QHarr Mar 31 '20 at 15:56

1 Answers1

2

You can use the iframe url direct to navigate to. You then need a timed wait to ensure the data has loaded within that iframe. I would then collect nodeLists via faster css selectors. As the nodeLists (one for figures and the other for locations) are the same length you will only need a single loop to index into both lists to get rows of data.

Option Explicit

Public Sub GetCovidFigures()

    Dim ie As SHDocVw.InternetExplorer

    Set ie = New SHDocVw.InternetExplorer

    Dim t As Date
    Const MAX_WAIT_SEC As Long = 30

    With ie
        .Visible = True
        .Navigate2 "https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6"

        Do
            DoEvents
        Loop While .Busy Or .readyState <> READYSTATE_COMPLETE

        t = Timer
        Do
            If Timer - t > MAX_WAIT_SEC Then Exit Sub
        Loop While .document.querySelectorAll(".feature-list strong").Length = 0

        Dim figures As Object, location As Object, results(), i As Long

        Set figures = .document.querySelectorAll("h5 strong")
        Set location = .document.querySelectorAll("h5 span:last-child")

        ReDim results(1 To figures.Length, 1 To 2)

        For i = 0 To figures.Length - 1
            results(i + 1, 1) = figures.item(i).innerText
            results(i + 1, 2) = location.item(i).innerText
        Next

        .Quit
    End With

    ActiveSheet.Cells(1, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

Consider how frequently you want this. There are large numbers of APIs popping up to supply this data which you could instead issue faster xhr requests to. Additionally, you could simply take the source data in csv form from github here. *Files after Feb 1 (UTC): once a day around 23:59 (UTC). There is a rest API visible in dev tools network tab that is frequently supplying new data in json format which is used to update the page. That can be accessed via Python + requests or R + httr modules for example. I suspect this endpoint is not intended to be hit so look for public APIs.

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Hi, thank you very much for this. It works perfectly fine. Regarding your question about the number of updates, I like to collect the data every evening around 20:00 GMT. I have some questions concerning your code though. You mentioned that you have difectly navigated to the iframe. How did you find out the iframe url? – Simon_A_Drax Apr 01 '20 at 08:49
  • If you inspect the html of the original url you will see the data of interest is within a parent iframe node and that iframe node has an src attribute which provides the URI for the embedded document. – QHarr Apr 01 '20 at 09:15
  • Ah okay. Now I understand. I still have a bunch of questions though, for example regarding this line of code ` Do If Timer - t > MAX_WAIT_SEC Then Exit Sub Loop While .document.querySelectorAll(".feature-list strong").Length = 0 ` Why have you specified Timer-t > MAX_WAIT_SEC. What is it doing excatly? Secondly, why did you put ".feature-list strong" as argument? Wouldn't "feature-list" be sufficient and why the "." before "feature-list". I really appreciate your help. I also marked your answer as helpful and accepted. – Simon_A_Drax Apr 01 '20 at 11:17