1

I'm trying to pull information from an html table. I want to add each element with in the table to a collection. This is what I have so far.

Dim htmlTable As Object
Dim coll2 As Collection
Set coll2 = New Collection
Set IE = New InternetExplorerMedium

With IE
'.AddressBar = False
'.MenuBar = False
.Navigate ("PASSWORDED SITE")
.Visible = True
End With

Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set tableItem = IE.Document.getElementsByTagName("td")
With coll2
For Each tableItem In htmlTable.innerHTML
   .Add tableItem
Next
End With

I have a problem with this line For Each tableItem In htmlTable.innerText I tried diffent variations of htmlTable.innerText each throwing differant errors.

This is the HTML Extract for the table.

<table class="Grid" id="ctl00_ContentPlaceHolder1_gvExtract" style="border-collapse: collapse;" border="1" rules="all" cellspacing="0">
        <tbody><tr class="GridHeader" style="font-weight: bold;">
            <th scope="col">Delete</th><th scope="col">Download</th><th scope="col">Extract Date</th><th scope="col">User Id Owner</th>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2942854)'>Delete</a></td>
            <td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942854")'>Work Order Inquiry - Work Order</a></td>
            <td>06/20/2017 07:50:37</td>
            <td>MBMAYO</td>
        </tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2942836)'>Delete</a></td>
            <td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942836")'>Work Order Inquiry - Work Order</a></td>
            <td>06/20/2017 07:39:29</td>
            <td>MBMAYO</td>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941835)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941835")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:23:54</td><td>MBMAYO</td>
        </tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941827)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941827")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:16:16</td><td>MBMAYO</td>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941822)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941822")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:14:06</td><td>MBMAYO</td>
        </tr>
    </tbody></table>

The goal is to store each <td> as an item for a collection and then retrieve the date for example <td>06/20/2017 07:50:37</td> from it. This table Grows so I think an array is out of the question?

Edit from comment:

I have been trying call this function, I'm getting a object does not support this method error:

Public Function htmlCell(id As String) As String 
    htmlCell = IE.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
                    .get‌​ElementsByTagName("t‌​d")(id).innerHTML 
End Function
Erik A
  • 31,639
  • 12
  • 42
  • 67
Quint
  • 530
  • 2
  • 8
  • 23
  • You must wait for the page to load after .Navigate E.g. https://stackoverflow.com/questions/23299134/failproof-wait-for-ie-to-load – Alex K. Jun 20 '17 at 12:43
  • What exactly is your question? What are those "different errors"? Where do they occur? Update your question accordingly. –  Jun 20 '17 at 12:44
  • @AlexK. I'm sorry. I did not enclude the navigating process to this page. It is very lengthy and did not want to spam the page with useless information. It does search for the element to become avalible. – Quint Jun 20 '17 at 12:45
  • 1
    @DavidG Thanks for your question. The errors are runtime 13 Type Mismatch with `For Each tableItem In htmlTable.innerText` and runtime 438 object Doesnt support the property or method with `For Each tableItem In htmlTable.innerText`. – Quint Jun 20 '17 at 13:05
  • 1
    I think you need to do some reading about the DOM (Document Object model), you are using these methods wrong. `getElementsByTagName` already returns a collection. Here are some examples on SO: https://stackoverflow.com/a/15192814/3820271 , https://stackoverflow.com/questions/20205442/excel-vba-get-inner-text-of-html-table-td – Andre Jun 20 '17 at 13:14
  • Possible duplicate of [Excel VBA: Get inner text of HTML table td](https://stackoverflow.com/questions/20205442/excel-vba-get-inner-text-of-html-table-td) – Andre Jun 20 '17 at 13:14
  • @Andre I don't think your link will work as you cant go directly to the url for the table and its a passworded site. Also I can see the whole HTML table in watch. It's pulling the information from it that is the trouble for me. – Quint Jun 20 '17 at 13:22
  • That is not relevant, you need to change your code after the `Set htmlTable = ...` line. Once you have a DOM object in a variable, the methods are the same. – Andre Jun 20 '17 at 13:26
  • @Andre I have been trying call this function `Public Function htmlCell(id As String) As String htmlCell = IE.getElementById("ctl00_ContentPlaceHolder1_gvExtract").getElementsByTagName("td")(id).innerHTML End Function`. I'm getting a object does not support this method error. – Quint Jun 20 '17 at 14:49

3 Answers3

3

What you probably need is something like this. HTH

Dim htmlTable As MSHTML.htmlTable
Dim htmlTableCells As MSHTML.IHTMLElementCollection
Dim htmlTableCell As MSHTML.htmlTableCell
Dim htmlAnchor As MSHTML.HTMLAnchorElement

Set htmlTable = ie.document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set htmlTableCells = htmlTable.getElementsByTagName("td")
With coll2
    For Each htmlTableCell In htmlTableCells
        If VBA.TypeName(htmlTableCell.FirstChild) = "HTMLAnchorElement" Then
            Set htmlAnchor = htmlTableCell.FirstChild
            .Add htmlAnchor.innerHTML
        Else
            .Add htmlTableCell.innerHTML
        End If
    Next
End With

Result

Dim el
For Each el In coll2
    Debug.Print el
Next el

Output:

Delete
Work Order Inquiry - Work Order
06/20/2017 07:50:37
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:39:29
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:23:54
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:16:16
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:14:06
MBMAYO
Daniel Dušek
  • 13,683
  • 5
  • 36
  • 51
1

I would try something like this:

Dim htmlTable As Object
Dim collTD As Collection
Dim oNode as Object

' Set IE ...

Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
' You only want the td's inside htmlTable !
Set collTD = htmlTable.getElementsByTagName("td")

For Each oNode In collTD 
    Debug.Print oNode.InnerHTML
    ' Stop -> use Watch window to drill down into oNode subitems
Next oNode 

and go from there.

Andre
  • 26,751
  • 7
  • 36
  • 80
0

I think it should be something like this.

Sub Scrape_HTML()

Set ie = CreateObject("InternetExplorer.application")

With ie
    .Visible = True
    .navigate "your_URL_here"

' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

                        Set Links = ie.document.getElementsByTagName("tr")
                        RowCount = 1

                            ' Scrape out the innertext of each 'tr' element.
                            With Sheets("DataSheet")
                                For Each lnk In Links
                                    .Range("A" & RowCount) = lnk.innerText
                                    RowCount = RowCount + 1
                                Next
                            End With

End Sub
ASH
  • 20,759
  • 19
  • 87
  • 200