4

Consider this extract of an html page:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<div class="BoxBody">
<span class="txt">20 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar','last');">Last</a>]</span></p>
<br>
<span class="txt">25 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar2','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>
</div>
</body>
</html>

I am trying to get the anchor tag that has the "next" page href (if it has one).

I tried this in the console using Firefox and it works:

document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")

I put up a sample VBA code using querySelector as well, but it fails with Invalid argument.

Sub test()

Dim oFSO As Object, paginator As Object
Dim oFS As Object, sText As String

Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "\example.html")

Do Until oFS.AtEndOfStream
    sText = oFS.ReadAll()
Loop


Dim html As HTMLDocument, html2 As Object
Set html = New HTMLDocument
Set html2 = html
html2.Write sText

Set paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")

End Sub

What is causing this? The p:nth-child(2) identifier? How should I go to extract that element using VBA?

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
drec4s
  • 7,946
  • 8
  • 33
  • 54

1 Answers1

4

nth-child(2) is not supported in VBA and is indeed causing the error message. You can't use :nth-child() or :nth-of-type(). There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child interestingly. You will also find you are limited on which objects you can chain querySelector on.

Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
   
On Error Resume Next
iText = ele.href
On Error GoTo 0

If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
    Debug.Print "No href"
Else
   Debug.Print "href"
End If

EDIT: 29/5/21 As of some point in last month (?) it has become possible to use element.querySelector widely as well as the most of the standard pseudo-class selectors (at least for Windows 10, MSHTML.DLL 11.00.19041.985 (Date modified 12/5/21)

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that `.BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1)` element.. – drec4s Nov 12 '18 at 17:00
  • Ok. If there is enough to demonstrate the choice that must be made . – QHarr Nov 12 '18 at 17:05
  • No, I want one match only (whether the 'next' button has an href, or not) – drec4s Nov 12 '18 at 17:14
  • Please check the edited `html`. I only want to check whether the first `a` with title `Next page` has an `href` or not...And I cannot use `querySelectorAll` as it is constantly crashing Excel... – drec4s Nov 12 '18 at 17:16
  • The first `a` tag with title `Next page` – drec4s Nov 12 '18 at 17:17
  • Was thinking about something like `html.querySelector(".BoxBody").getElementsByTagName("p")(2).getElementsByTagName("span")(1).querySelector("a[title='Next page']")` but that doesn't seem to work also... – drec4s Nov 12 '18 at 17:19
  • Please bear with me. Is the above closer to what you are asking? Just updated – QHarr Nov 12 '18 at 17:21