2

I have a macro that tries to extract all the href values from a page but it only seems to get the first one. If someone could help me out that would be greatly appreciated.

The URL I used is https://www.facebook.com/marketplace/vancouver/entertainment

Screenshot of HTML

<div class="_3-98" data-testid="marketplace_home_feed">
  <div>
    <div>
      <div class="_65db">
          <a class="_1oem" href="/marketplace/item/920841554781924" data-testid="marketplace_feed_item">
          <a class="_1oem" href="/marketplace/item/580124349088759" data-testid="marketplace_feed_item">
          <a class="_1oem" href="/marketplace/item/1060730340772072" data-testid="marketplace_feed_item">
    Sub Macro1()
``marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
    On Error Resume Next    ' sometimes more web pages are counted than are open
    my_url = objShell.Windows(x).document.Location
    my_title = objShell.Windows(x).document.Title

    If my_title Like "Facebook" & "*" Then 'compare to find if the desired web page is already open
        Set ie = objShell.Windows(x)
        marker = 1
        Exit For
    Else
    End If
Next

Set my_data = ie.document.getElementsByClassName("_3-98")
Dim link
i = 1
For Each elem In my_data
    Set link = elem.getElementsByTagName("a")(0)
    i = i + 1

     'copy the data to the excel sheet
    ActiveSheet.Cells(i, 4).Value = link.href

Next

End Sub
Vekin0
  • 83
  • 2
  • 11

2 Answers2

3

You can use a CSS selector combination to get the elements. If you provide the actual HTML, not as an image it will be easier to test and determine best combination. The selector is applied via the querySelectorAll method to return a nodeList of all matching elements. You traverse the .Length of the nodeList to access items by index from 0 to .Length-1.

VBA:

Dim aNodeList As Object, i As Long
Set aNodeList = ie.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length-1
   Activesheet.Cells(i + 2,4) = aNodeList.item(i)
Next 

The css selector combination is ._1oem[href], which selects the href attributes of elements with a class of _1oem. The "." is a class selector and the [] an attribute selector. It is a fast and robust method.

The above assumes there are no parent form/frame/iframe tags to negotiate.

An alternative selector that matches on the two attributes, rather than the class would be:

html.querySelectorAll("[data-testid='marketplace_feed_item'][href]")

Full example:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "https://www.facebook.com/marketplace/vancouver/entertainment"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim aNodeList As Object, i As Long
        Set aNodeList = IE.document.querySelectorAll("._1oem[href]")
        For i = 0 To aNodeList.Length - 1
            ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i)
        Next
        'Quit '<== Remember to quit application
    End With
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Hey I appreciate the help but it doesn't look like either ie.document.querySelectorAll("._1oem[href]") or html.querySelectorAll("[data-testid='marketplace_feed_item'][href]") is returning anything – Vekin0 Sep 25 '18 at 00:12
  • Please try with the full example given above. It seems it wants the .item(i) syntax. – QHarr Sep 25 '18 at 05:54
  • 1
    Thanks! Works great – Vekin0 Sep 25 '18 at 23:19
1

You only ask for the first anchor element within each element with a _3-98 class. Iterate through the collection of anchor elements within the parent element.

...

dim j as long
Set my_data = ie.document.getElementsByClassName("_65db")

For Each elem In my_data

    for i = 0 to elem.getelementsbytagname("a").count -1

        j = j+1
        ActiveSheet.Cells(j, 4).Value = elem.getElementsByTagName("a")(i).href

    next i

Next elem 

...
  • Thanks, I understand a little better where I went wrong. Unfortunately its still only giving me the first element. – Vekin0 Sep 24 '18 at 23:52