2

I have having a horrible time scraping this particular webpage's data... Basically I can see the information that I need in the 'DOM Explorer' when I load the URL in a browser and hit F12 manually, but when I programmatically attempt to do the same (see below) the HTMLDoc does not contain the same information that I can see in the 'DOM Explorer'...

Public Sub testCode()

    Dim IE As SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument
    Set IE = New SHDocVw.InternetExplorer
    With IE
        .navigate "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW"
        While .Busy = True Or .ReadyState <> READYSTATE_COMPLETE: Wend
        Set HTMLDoc = .Document
    End With

End Sub

Can someone please help me access the information in the 'DOM Explorer'? I know that HTML is not always what you see in a browser, but rather the instructions to create what you can see in the browser, but then there must be a way to programmatically create the DOM from the HTML...

Also, I believe that the data I am after is being generated by scripts or iFrames, but I have been unable to generate the data I am looking for from messing around with either....

UPDATE

See picture of DOM Explorer below:

DOM

omegastripes
  • 12,351
  • 4
  • 45
  • 96
TheSilkCode
  • 366
  • 2
  • 11
  • Please be more specific and post the expected output, maybe screenshot of the webpage with highlighted areas where that data is, and URL you are scraping. [This question](http://stackoverflow.com/q/40225095/2165759) may help. – omegastripes Mar 27 '17 at 18:31
  • Hi @omegastripes - I tried putting screenshots in but couldn't paste them into edit window.... if there's any information on this you need please let me know and I can send to you- I am sure I have it... Also, am looking through other example now, no luck so far.. – TheSilkCode Mar 27 '17 at 19:10
  • From [Editing Help](http://stackoverflow.com/editing-help) - [images](http://stackoverflow.com/editing-help#images) *You can also press CTRL+I to insert an image*. – Mathieu Guindon Mar 27 '17 at 19:40
  • @TheSilkCode post the expected output then. – omegastripes Mar 27 '17 at 19:46
  • Hi @omegastripes - I added a picture of the DOM Explorer that contains the information that I want- however, wasn't sure what the best way to show the HTMLDoc's contents... let me know – TheSilkCode Mar 27 '17 at 20:12
  • @TheSilkCode you may copy HTML snippet in DOM explorer, beatify it using any online service, and put into the post. Codeblock as image is not a good idea. I was talking about the screenshot of the webpage, not code. What rendered output should look like? – omegastripes Mar 27 '17 at 20:23
  • @omegastripes if you are looking at the webpage, I am looking for information the '10-Day Weather Forecast Table'. In the DOM Explorer, its under the 'fctHourly' class... – TheSilkCode Mar 27 '17 at 20:31

1 Answers1

1

The outline:

Actually web browser do almost the same stuff each time you open that webpage.

You may use the below VBA code to parse response and output result. Import JSON.bas module into the VBA project for JSON processing.

Sub TestScrapeWunderground()

    Dim sContent As String
    Dim sKey As String
    Dim sLocation As String
    Dim vJSON As Variant
    Dim sState As String
    Dim oDays As Object
    Dim oHours As Object
    Dim vDay As Variant
    Dim vHour As Variant
    Dim aRows() As Variant
    Dim aHeader() As Variant

    ' GET XHR to retrieve location and key
    With CreateObject("MSXML2.ServerXMLHTTP")
        .Open "GET", "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW", False
        .Send
        sContent = .responseText
    End With
    ' Extract location and key from HTML content
    sLocation = Split(Split(sContent, "var query = 'zmw:' + '", 2)(1), "'", 2)(0)
    sKey = Split(Split(sContent, vbTab & "k: '", 2)(1), "'", 2)(0)
    ' GET XHR to retrieve JSON data
    With CreateObject("MSXML2.ServerXMLHTTP")
        .Open "GET", "https://api-ak-aws.wunderground.com/api/" & sKey & "/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/bestfct:1/q/zmw:" & sLocation & ".json", False
        .Send
        sContent = .responseText
    End With
    ' Parse JSON response to data structure
    JSON.Parse sContent, vJSON, sState
    ' Populate dictionaries with daily and hourly forecast data
    Set oDays = CreateObject("Scripting.Dictionary")
    Set oHours = CreateObject("Scripting.Dictionary")
    For Each vDay In vJSON("forecast")("days")
        oDays(vDay("summary")) = ""
        For Each vHour In vDay("hours")
            oHours(vHour) = ""
        Next
    Next
    ' Convert daily forecast data to arrays
    JSON.ToArray oDays.Keys(), aRows, aHeader
    ' Output daily forecast data to table
    With Sheets(1)
        .Cells.Delete
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aRows
        .Columns.AutoFit
    End With
    ' Convert hourly forecast data to arrays
    JSON.ToArray oHours.Keys(), aRows, aHeader
    ' Output hourly forecast data to table
    With Sheets(2)
        .Cells.Delete
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aRows
        .Columns.AutoFit
    End With
    ' Convert response data to arrays
    JSON.ToArray Array(vJSON("response")), aRows, aHeader
    ' Output response transposed data to table
    With Sheets(3)
        .Cells.Delete
        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        .Columns.AutoFit
    End With
    ' Convert current data to arrays
    JSON.ToArray Array(vJSON("current_observation")), aRows, aHeader
    ' Output current transposed data to table
    With Sheets(4)
        .Cells.Delete
        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        .Columns.AutoFit
    End With
    ' Populate dictionary with daily astronomy data
    Set oDays = CreateObject("Scripting.Dictionary")
    For Each vDay In vJSON("astronomy")("days")
        oDays(vDay) = ""
    Next
    ' Convert daily astronomy data to arrays
    JSON.ToArray oDays.Keys(), aRows, aHeader
    ' Output daily astronomy transposed data to table
    With Sheets(5)
        .Cells.Delete
        Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
        Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        .Columns.AutoFit
    End With
    ' Convert hourly history data to arrays
    JSON.ToArray vJSON("history")("days")(0)("hours"), aRows, aHeader
    ' Output hourly history data to table
    With Sheets(6)
        .Cells.Delete
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aRows
        .Columns.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                1, _
                UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

The second XHR returns JSON data, to make it clear how the necessary data is being extracted from it, you may save the JSON to file, copy the contents and paste it to any JSON viewer for further study. I use online tool http://jsonviewer.stack.hu, root element structure is shown below:

JSON structure

There are 6 main sections, the relevant part of the data is extracted and output to 6 worksheets (which have to be created manually before run):

Sheet1 - Daily forecast
Sheet2 - Horly forecast
Sheet3 - Response data (transposed)
Sheet4 - Current data (transposed)
Sheet5 - Astronomy (transposed)
Sheet6 - Hourly history data

Having that example you can extract the data you need from that JSON response.

omegastripes
  • 12,351
  • 4
  • 45
  • 96
  • this is fantastic- there's a lot there so I haven't gotten through testing all of it but I am sure this will get me on the right track, thank you so much! – TheSilkCode Apr 04 '17 at 21:35
  • Finally got through testing all this- works great so thank you for your help but I did change a few things- 1) trimmed the API URL to only request the data I was interested in as well as change the units to English and 2) instead of using your JSON parser solution I actually used a series of splits and loops to parse the data- one reason for doing so was because I had a hard time following the JSON parsing code and I don't like to use copy and pasted code that I don't understand, but I also found that it was ~ 60% faster than parsing with Rage and Scripting Dictionaries. Anyways thanks again! – TheSilkCode Apr 05 '17 at 11:56