0

I'm building my first VBA code to scrape data from a website. I'm able to open the site and navigate through a button to get the correct data on screen but I'm having difficulty referencing the correct table to loop through. I want to access an embedded table 'Activity'. To do this, I've taken the answer from here on how to cycle through a table and extract information, and embedded in my code. Below are three areas through errors.

Are these linked (particularly query B&C) and does anyone have any ideas?

Thanks so much!

------Solution Code (from QHarr answer below)-------------

Note: requires References (VBE > Tools > References and add references to): Microsoft Internet Controls Microsoft HTML Object Library

Public Sub GetTable()
    Dim IE As InternetExplorer, ele As Object, clipboard As Object, hTable As htmlTable, t As Date, ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Const MAX_WAIT_SEC As Long = 20
    Set IE = New InternetExplorer
    With IE
        .Visible = True
        .navigate "https://na3.docusign.net/Member/EmailStart.aspx?a=59595fcb-34be-4375-b880-a0be581d0f37&r=f6d28b49-e66d-4fa4-a7e9-69c2c741fde5"
        While .Busy Or .readyState < 4: DoEvents: Wend
        t = Timer
        Do
            DoEvents
            On Error Resume Next
            Set ele = .Document.querySelector("[data-qa='show-history']")
            'On Error GoTo 0 'I removed this line as it was throwing an error as soon as the 'Show-history' element loaded.
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop While ele Is Nothing

        If ele Is Nothing Then Exit Sub

        ele.Click

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set hTable = .Document.querySelector("[data-qa='history-dialog-audit-logs']")

        ''**********************************************************************
        '' Loop table and write out method. This method uses the sub WriteTable
        Application.ScreenUpdating = False  
        WriteTable hTable, 1, ws
        Application.ScreenUpdating = True
        ''**********************************************************************
        .Quit
    End With
End Sub

Public Sub WriteTable(ByVal hTable As htmlTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ws
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            c = 1
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1
        Next tr
    End With
End Sub

------Original Query Code-------

Query A: I get an object required error when the page is loading, which goes away if I continue with the script, so I believe is an issue with handling the loading time? It occurs after the 'loop' code finishes:

    With objIE
        .Visible = True
        .navigate WebSite
        Do While .Busy Or .readyState <> 4
            DoEvents
        Loop

        .document.querySelector("[data-qa='show-history']").Click

Query B: I get another object required error on this line, which I can also continue past:

For Each ele In objIE.document.getElementById("activity").getElementsByTagName("tr")

Query C: I get a subscript out of range error on the following line, and can progress no further

Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent

enter image description here Full Code:

Sub googlesearch3()
    Set objIE = CreateObject("InternetExplorer.Application")
    WebSite = "websiteurl"

    With objIE
        .Visible = True
        .navigate WebSite
        Do While .Busy Or .readyState <> 4
            DoEvents
        Loop

        .document.querySelector("[data-qa='show-history']").Click
End With
'within the 'history-dialog-audit-logs' tabe, loop and extract data


    'we will output data to excel, starting on row 1
    y = 1

    'look at all the 'tr' elements in the 'table' with id 'myTable',
    'and evaluate each, one at a time, using 'ele' variable
    For Each ele In objIE.document.getElementById("activity").getElementsByTagName("tr")
        'show the text content of 'tr' element being looked at
        Debug.Print ele.textContent
        'each 'tr' (table row) element contains 4 children ('td') elements
        'put text of 1st 'td' in col A
        Sheets("Sheet1").Range("A" & y).Value = ele.Children(0).textContent
        'put text of 2nd 'td' in col B
        Sheets("Sheet1").Range("B" & y).Value = ele.Children(1).textContent
        'put text of 3rd 'td' in col C
        Sheets("Sheet1").Range("C" & y).Value = ele.Children(2).textContent
        'put text of 4th 'td' in col D
        Sheets("Sheet1").Range("D" & y).Value = ele.Children(3).textContent
        'increment row counter by 1
        y = y + 1
    'repeat until last ele has been evaluated
    Next

'check if word 'completed' is mentoined anwhere, if so update 'Status' to 'Completed' and search for text.

'Find "signed the envelope" and show all text before this until you find <td?. Stop after one occurance
'store text in 'LastSigned'string

'find "sent an invitation to" and show all text before this until you find <td>. Stop after one occurance
'store text in 'CurrentlyWith' sting


 Set IE = Nothing

End Sub

Additional: I've tried the answer here, but the DIM statements didn't work...

1 Answers1

1

Here are two methods for writing out the table. One using clipboard and the other by looping rows and table cells within rows (that version is commented out - 3 lines). I use a loop with time out of MAX_WAIT_SEC seconds to allow for clickable element to be set as an attempt to address your question 1. There is not enough HTML for me to give good explanations for your problem 2 and 3.They could both be related to timing issues at the start.

Note: Usually after a .Click you want another While .Busy Or .readyState < 4: DoEvents: Wend, and potentially another Do Loop, to allow for page content to update.

Option Explicit
Public Sub GetTable()
    Dim IE As InternetExplorer, ele As Object, clipboard As Object, hTable As HTMLTable, t As Date, ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Const MAX_WAIT_SEC As Long = 5
    Set IE = New InternetExplorer
    With IE
        .Visible = True
        .navigate "yourURL"
        While .Busy Or .readyState < 4: DoEvents: Wend
        t = Timer
        Do
            DoEvents
            On Error Resume Next
            Set ele = .Document.querySelector("[data-qa='show-history']")
            On Error GoTo 0
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop While ele Is Nothing

        If ele Is Nothing Then Exit Sub

        ele.Click

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set hTable = .Document.querySelector("#activity .dstable")

        ''*********************************************************************
        ''Copy table to clipboard and paste  method
        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        clipboard.SetText = hTable.outerHTML
        clipboard.PutInClipboard
        ws.Cells(1, 1).PasteSpecial
        ''**********************************************************************

        ''**********************************************************************
        '' Loop table and write out method. This method uses the sub WriteTable
        ' Application.ScreenUpdating = False  '<==Uncomment these 3 lines and comment out lines above if using this method.
        ' WriteTable hTable, 1, ws
        ' Application.ScreenUpdating = True
        ''**********************************************************************
        .Quit
    End With
End Sub

Public Sub WriteTable(ByVal hTable As HTMLTable, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
    If ws Is Nothing Then Set ws = ActiveSheet
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = startRow
    With ws
        Set tRow = hTable.getElementsByTagName("tr")
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            c = 1
            For Each td In tCell
                .Cells(r, c).Value = td.innerText
                c = c + 1
            Next td
            r = r + 1
        Next tr
    End With
End Sub

References (VBE > Tools > References and add references to):

  1. Microsoft Internet Controls
  2. Microsoft HTML Object Library

Edit: There now seems to be problems with late bound clipboard reference in some cases. Here is generic early bound method where hTable is the target HTMLTable object.

For clipboard early bound go VBE > Tools > References > Microsoft-Forms 2.0 Object Library.

If you add a UserForm to your project, the library will get automatically added.

Dim clipboard As DataObject
Set clipboard = New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
QHarr
  • 83,427
  • 12
  • 54
  • 101