0

I want to eventually create a function where I can specify a web page element and URL and populate all instances of that element down a column. But am currently only experiencing limited success with this function:

Sub GrabAnchorTags() '(URL As String) As Variant'

    Dim objIE As InternetExplorer
    Dim elem As Object

    Set objIE = New InternetExplorer
    objIE.Visible = False

    objIE.navigate "http://example.com/"
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

    Dim aRange As Range
    Debug.Print objIE.document.getElementsByTagName("a").Length

    For Each elem In objIE.document.getElementsByTagName("a")
        Debug.Print elem
        ActiveCell.Offset(x, y).Value = elem
        ActiveCell.Offset(x, y + 1).Value = elem.textContent
        x = x + 1

    Next

    objIE.Quit
    Set objIE = Nothing
End Sub

I would like to be able to turn this successfully from a macro to a function.

  1. Currently, it uses a for loop to populate the cells and I wonder if it's possible to accomplish the same thing using evaluate or something similar because the for loop is inefficient.

  2. This function would need to live in a cell, reference a URL in another cell, and populate the cells bellow it with all elements of a type found on the page. I am currently working on the anchor tag.

Many other solutions I referenced used macros:

Scraping data from website using excel vba

Getting links url from a webpage excel vba

VBA – Web scraping with getElementsByTagName()

Community
  • 1
  • 1
Ryan Lutz
  • 247
  • 1
  • 12

1 Answers1

0

Generally speaking, whenever you have many cells to write to, you should enter the data into an internal array, and then write the entire array to the worksheet in one hit. However you seem to not want a macro/sub in your case.

If you wish it to take the worksheet formula approach for usability reasons, then the best way is to use a very powerful, but underused technique in Excel development.

A NAMED RANGE

Named ranges are Excels closest thing to getting an in-memory block of data, and then other simpler formulas can use the named range to get info from the Named Range.

A Named Range doesn't have to actually be a simple block of cells on a sheet. You can write your VBA formula as a Public formula, and then reference it in the Named Range.

Function getElems(url As String, tagName As String) As String()

  Dim browser As New MSXML2.XMLHTTP60
  Dim doc As MSHTML.HTMLDocument
  With browser
        .Open "GET", url, False
        .send

        If .readyState = 4 And .Status = 200 Then
            Set doc = New MSHTML.HTMLDocument
            doc.body.innerHTML = .responseText
        Else
            MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
            vbNewLine & "HTTP request status: " & .Status
        End If
  End With

  Dim tag As MSHTML.IHTMLElement
  Dim tags As MSHTML.IHTMLElementCollection
  Set tags = doc.getElementsByTagName(tagName)

  Dim arr() As String
  Dim arrCounter As Long: arrCounter = 1
  ReDim arr(1 To tags.Length, 1 To 2)

  For Each tag In tags
        arr(arrCounter, 1) = tag.innerText
        'Change the below if block to suit
        If tagName = "a" Then
              arr(arrCounter, 2) = tag.href
        Else
              arr(arrCounter, 2) = tag.innerText
        End If
        arrCounter = arrCounter + 1
  Next tag

  Set doc = Nothing
  Set browser = Nothing

  getElems = arr
End Function

Now set a Named Range in Excel such as:

elementData

=getElems(Sheet1!$A$1, Sheet1!$B$1) In A1, put the URL, and in B1 put the tag Name such as "a"

Then in your cells you can say

=INDEX(elementData, ROW(1:1), 1) and in adjacent cell put =INDEX(elementData, ROW(1:1), 2) (or use ROWS formula technique)

and drag down.

MacroMarc
  • 3,214
  • 2
  • 11
  • 20