2

I was following a tutorial of this video https://www.youtube.com/watch?v=sGw6r5GVA5g&t=2803s made by the WiseOwlTutorials channel and got stuck at a listing procedure he explains at the 36:00 position of the video.

At that point, he starts to explain how to return the video url and name of a video list from a specific category through a iteration method called Sub ListVideosOnPage(VidCatName As String, VidCatURL As String) used in another module which loops through all video categories of their website main video page https://www.wiseowl.co.uk/videos (left corner menu list).

When this procedure starts, it goes inside each video category and get the name and url of each video from that category in order to list it on a page which, in that part of the Youtube video cited above, is a debug page. However, the actual WiseOwl Video page is diferente from that when the tutorial video was made.

So, I changed his method a little in order to put the correct elements on the debbugin page, as shown below:

Sub ListVideosOnPage(VidCatName As String, VidCatURL As String)

Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Dim VidTables As MSHTML.IHTMLElementCollection
Dim VidTable As MSHTML.IHTMLElement
Dim VidRows As MSHTML.IHTMLElementCollection
Dim VidRow As MSHTML.IHTMLElement
Dim VidLink As MSHTML.IHTMLElement    

    XMLReq.Open "GET", VidCatURL, False
    XMLReq.send

    If XMLReq.Status <> 200 Then
        MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
        Exit Sub
    End If

    HTMLDoc.body.innerHTML = XMLReq.responseText
        'get the table element in each video category found by other module
        'VidTables tag added by me to get the new element on the WiseOwl website
        Set VidTables = HTMLDoc.getElementsByTagName("table")
        'loop starts to search for row and link tags on the current table
        For Each VidTable In VidTables
            Set VidRows = VidTable.getElementsByTagName("tr")
            For Each VidRow In VidRows
                Set VidLink = VidRow.getElementsByTagName("a")(0) 'just pick the first link
                Debug.Print VidRow.innerText, VidRow.getattribute("href") 'objetc variable not set error happpens here
            Next VidRow
        Next VidTable           
End Sub

I found a way to circumvent this Object Variable or With Variable not set error by changing the code inside vidrow loop, adding a manual index to the code to get only the first link in each row:

       For Each VidTable In VidTables
            Set VidRows = VidTable.getElementsByTagName("tr")
            For Each VidRow In VidRows
                Index = 0
                For Each VidLink In VidLinks
                   If Index = 0 Then
                            Debug.Print VidLink.innerText, VidLink.getAttribute("href")
                            Index = Index + 1
                   End If
                Next VidLink
            Next VidRow
        Next VidTable

But, in the turorial video referenced above, the instructor doesnt get this error when he codes indexes in the way shown below:

        VidLink = VidRow.getElementsByTagName("a")(0) 
        Debug.Print VidRow.innerText, VidRow.getattribute("href")

So my question is how do I get these object variable not set errors and in the tutorial video the instructor doesnt? Looks like the same code to me, with each element defined in the right way and a much more efficient way to code then using if's. Could anyone more used to VBA please help with an answer this? Maybe I missing something.

artkoshelev
  • 872
  • 7
  • 22
thiggy01
  • 129
  • 1
  • 11

1 Answers1

2

tl:dr:

  1. I first give you the debug and fix info;
  2. I go on to show you a different way using CSS selectors to target the page styling. This is generally faster, more robust and more flexible;
  3. VidCatName doesn't appear to be used but I have left in for now. I personally would remove unless you will later develop the code to use this variable. The second sub parameters are passed by value so I have added ByVal to the signature.

Debugging:

Your error is because you are looping all table rows and trying to access a tags and then href attributes. The first row of each table is the header row and this doesn't have a tag elements, nor associated href attributes. See image below:

Table element on page:

enter image description here

See that the first tr tagged element in the table contains a child th tag element, indicating it is the table header, and that there is no associated a tag element.

Kind of like you were shown elsewhere in that video, you want to change your loop to a For Next, and then, in this case, start from index 1 to skip the header row.

So, the part containing this line: For Each VidRow In VidRows , becomes the following:

Dim VidRowID As Long
For Each VidTable In VidTables
    Set VidRows = VidTable.getElementsByTagName("tr")
    For VidRowID = 1 To VidRows.Length - 1 'first row is actually header which doesn't have an a tag or href
        Set VidLink = VidRows(VidRowID).getElementsByTagName("a")(0) 
        Debug.Print VidLink.innerText, VidLink.getAttribute("href") 
    Next VidRowID
Next VidTable

There is also only one table per page so a loop of all tables is unnecessary code in this case.


Example full call (using your code with just the change in loop type):

Option Explicit        
Public Sub test()    
    ListVideosOnPage "Business Intelligence (70)", "https://www.wiseowl.co.uk/business-intelligence/videos/"    
End Sub

Public Sub ListVideosOnPage(ByVal VidCatName As String,ByVal VidCatURL As String)  
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument    
    Dim VidTables As MSHTML.IHTMLElementCollection
    Dim VidTable As MSHTML.IHTMLElement
    Dim VidRows As MSHTML.IHTMLElementCollection
    Dim VidRow As MSHTML.IHTMLElement
    Dim VidLink As MSHTML.IHTMLElement

    XMLReq.Open "GET", VidCatURL, False
    XMLReq.send

    If XMLReq.Status <> 200 Then
        MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
        Exit Sub
    End If

    HTMLDoc.body.innerHTML = XMLReq.responseText
    Set VidTables = HTMLDoc.getElementsByTagName("table") 'Should limit to just one table
    Dim VidRowID As Long
    For Each VidTable In VidTables
        Set VidRows = VidTable.getElementsByTagName("tr")
        For VidRowID = 1 To VidRows.Length - 1 'first row is actually header which doesn't have an a tag or href
            Set VidLink = VidRows(VidRowID).getElementsByTagName("a")(0) 
            Debug.Print VidLink.innerText, VidLink.getAttribute("href") 
        Next VidRowID
    Next VidTable
End Sub

CSS selectors:

I would instead use a CSS selector combination to target the a tag elements within the target parent table element. This is written as .bpTable a. A more official term for this combination is descendant selector.

The descendant combinator — typically represented by a single space ( ) character — combines two selectors such that elements matched by the second selector are selected if they have an ancestor element matching the first selector. Selectors that utilize a descendant combinator are called descendant selectors.

The .bpTable is in fact itself a class selector (like .getElementsByClassName). The class part indicated by the leading ".". So, elements with class name bpTable; which is the class name of the target table on each page.

Target table element on page:

enter image description here

This selector is applied via the .querySelectorAll method of .document and returns a static nodeList. You can then loop the .Length of this nodeList, from 0 to .Length -1, accessing elements by index.

Public Sub ListVideosOnPage(ByVal VidCatName As String, ByVal VidCatURL As String)
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument

    XMLReq.Open "GET", VidCatURL, False
    XMLReq.send

    If XMLReq.Status <> 200 Then
        MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
        Exit Sub
    End If
    HTMLDoc.body.innerHTML = XMLReq.responseText

    Dim aNodeList As Object, link As Long
    Set aNodeList = HTMLDoc.querySelectorAll(".bpTable a")
    For link = 0 To aNodeList.Length - 1
        Debug.Print aNodeList(link).innerText, aNodeList(link).href
    Next
End Sub

References (VBE > Tools > References):

  1. Microsoft HTML Object Library
  2. Microsoft XML, V6.0 'For my Excel 2016 version
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • 1
    Wow! What a enlightening class! So much details. Didn't know this community was so wonderful. Everything makes sense to me now. The error was due to the `th` (row header) tag in each table not having a `a` tag. Hence, there was no object variable set in that row. – thiggy01 Sep 22 '18 at 18:45
  • I think, especially if you are trying to follow a tutorial, it is helpful to have answers that relate to that tutorial which is why I gave the first part. The second part was to give you an idea about alternatives. Of course, that needed a little contextualising with some links I hope you find useful for future scraping. I am a fan of WiseOwl having gone on a course of theirs in the past. – QHarr Sep 22 '18 at 18:53
  • Since they published this tutorial, their video main page expanded to other subcategories which contains other subcategories, giving us more trouble to find their increasing number of tutorial videos added to the site. The CSS Selector method is a much simpler and efficient approach to this looping problem than the commonly used HTML ones. – thiggy01 Sep 22 '18 at 19:40
  • CSS selectors are generally my first choice go to. You have much more specificity and flexibility available to you. – QHarr Sep 22 '18 at 19:41