0

I am currently attempting to use a command button on a worksheet and VBA to import player statistic data to my worksheet from an official sports website on button click. The data that I am trying to copy is over 21 different webpages.

The format of the URL for the web pages is as follows: http://www.afl.com.au/stats/player-ratings/overall-standings#page/1 where the number at the very end of the URL varies between 1 and 21 to indicate which data sheet you're looking at.

See below for my code:

Sub Button1_Click()

    Const WebAddress As String = "http://www.afl.com.au/stats/player-ratings/_
    overall-standings#page/"

    Dim qt As QueryTable
    Dim PlayerRatings As Worksheet
    Dim PageNumber As Integer
    Dim RowPasteNumber As Integer

    RowPasteNumber = 6
    Set PlayerRatings = ActiveSheet

    For PageNumber = 1 To 21

        Set qt = PlayerRatings.QueryTables.Add(Connection:="URL;" & WebAddress & PageNumber,_ 
        Destination:=Range("A" & RowPasteNumber))
        qt.Refresh BackgroundQuery:=False
        RowPasteNumber = RowPasteNumber + 41

    Next PageNumber

End Sub

So my thinking was that I should be able to use a FOR loop which increments the PageNumber integer each time to cycle through the different webpages where the data is located then copy the data off that webpage and paste it 41 rows below the previous data in my worksheet.

The issue that I'm having is that the FOR loop completes 21 iterations (as it should) and each time it pastes the data 41 rows below the previous data (as it should) but it continues to copy the data on webpage 1 over and over again.

Can anybody see why my code may be doing this?

Your assistance is much appreciated.

Stephen

2 Answers2

0

I recently ran into a similar problem while web scraping.

The problem is that the address contains a location hash #. Anything after the # is never processed by the server.

To repeat part of another stackoverflow answer (Why the hash part of the URL is not in the server side?)

Here's what Wikipedia says about it:

The fragment identifier functions differently than the rest of the URI: namely, its processing is exclusively client-side with no participation from the server. When an agent (such as a Web browser) requests a resource from a Web server, the agent sends the URI to the server, but does not send the fragment. Instead, the agent waits for the server to send the resource, and then the agent processes the resource according to the fragment value. In the most common case, the agent scrolls a Web page down to the anchor element which has an attribute string equal to the fragment value. Other client behaviors are possible


The easiest way to work around this would be to automate the IE object directly and get a copy of the document.body.innerHTML after each navigate and/or click action. For a starting point, have a look here: http://www.excely.com/excel-vba/ie-automation.shtml

Community
  • 1
  • 1
Ru Hasha
  • 876
  • 1
  • 7
  • 16
  • Cheers Ru Hasha. This is my first go at VBA so I'll take a look into that excely link and see what I come up with. Thanks for the advice! – Stephen Mar 21 '16 at 21:48
0

You could write them out using IE to browse to the pages:


Sample from web:

SAMPLE


Sample code output:

Sample


VBA:

Option Explicit

Public Sub GetTables()
    Dim id, hTable As HTMLTable, ie As Object, ws As Worksheet
    Application.ScreenUpdating = False
    Set ws = ActiveSheet
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = True
        For id = 1 To 2  '21
            .navigate "http://www.afl.com.au/stats/player-ratings/overall-standings#page/" & id
           While .Busy Or .readyState < 4: DoEvents: Wend
           Dim file As Object
           Set file = CreateObject("htmlFile")
           With file
                DoEvents
                .Write ie.document.body.innerHTML
                Set hTable = .getElementById("playerRatings-table")
                WriteTable hTable, GetLastRow(ws, 1) + 1
            End With
            Set hTable = Nothing: Set file = Nothing
        Next id
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)

    If ws Is Nothing Then Set ws = ActiveSheet

    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, R As Long, C As Long, tBody As Object
    R = startRow
    With ActiveSheet
        Dim headers As Object, header As Object, columnCounter As Long
        Set headers = hTable.getElementsByTagName("th")
        For Each header In headers
            columnCounter = columnCounter + 1
            .Cells(startRow, columnCounter) = header.innerText
        Next header
        startRow = startRow + 1
        Set tBody = hTable.getElementsByTagName("tbody")
        For Each tSection In tBody               'HTMLTableSection
            Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
            For Each tr In tRow
                R = R + 1
                Set tCell = tr.getElementsByTagName("td")
                C = 1
                For Each td In tCell             'DispHTMLElementCollection
                    .Cells(R, C).Value = td.innerText 'HTMLTableCell
                    C = C + 1
                Next td
            Next tr
        Next tSection
    End With
End Sub

Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
    With ws
        GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
    End With
End Function
QHarr
  • 83,427
  • 12
  • 54
  • 101