1

Using following code I found on web it does not return results when searching for phone numbers, with text its fine, brings back weblink and title

I have noticed that when search for number there is no className "r" in link.className, how would I fix to use with phone numbers

Sub XMLHTTP()

Dim url As String, lastRow As Long, i As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object

lastRow = Range("A" & Rows.Count).End(xlUp).Row

For i = 2 To lastRow

  url = "https://www.google.co.uk/search?q=03701116565" & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

    Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
    XMLHTTP.Open "GET", url, False
    XMLHTTP.setRequestHeader "Content-Type", "text/xml"
    XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
    XMLHTTP.send

    Set html = CreateObject("htmlfile")
    html.body.innerHTML = XMLHTTP.ResponseText
    Set objResultDiv = html.getelementbyid("rso")

    Set objH3 = objResultDiv.getelementsbytagname("h3")

    For Each link In objH3
        If link.className = "r" Then
            Cells(i, 2) = link.innerText
            Cells(i, 3) = link.getelementsbytagname("a")(0).href
            DoEvents
        End If
    Next
Next

End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
JPS
  • 119
  • 1
  • 8
  • @qharr this works great much appreciated, not tried the late binding yet but expect that will work on my system, much appreciated – JPS Jan 16 '19 at 16:01

1 Answers1

1

There is a class name r. Observe the following:

Option Explicit
Public Sub GetLinks()
    Dim html As HTMLDocument, links As Object, i As Long, counter As Long
    Set html = New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.google.co.uk/search?q=03701116565", False
        .send
        html.body.innerHTML = StrConv(.responseBody, vbUnicode)
    End With

    With html
        Set links = .querySelectorAll(".r > [href] , .r h3")
    End With
    For i = 0 To links.Length - 1 Step 2
        counter = counter + 1
        ActiveSheet.Cells(counter, 1) = links.item(i)
        ActiveSheet.Cells(counter, 2) = links.item(i + 1).innerText
    Next
End Sub

The actual href is associated with a child a tag which precedes the h3 header tag element which you are targeting by class. The r is the class of the parent of the a tag.

enter image description here


If you want to use late bound, and a similar approach to yours, you can use the less efficient following method. Note that the parent div elements are selected so access to the a tag and h3 are possible for qualifying classes.

Option Explicit
Public Sub GetLinks()
    Dim html As Object, i As Long
    Dim objResultDiv As Object, objH3 As Object, link As Object

    Set html = CreateObject("htmlfile")
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.google.co.uk/search?q=03701116565", False
        .send
        html.body.innerHTML = .responseText
    End With

    Set objResultDiv = html.getElementById("rso")
    Set objH3 = objResultDiv.getElementsByTagName("div")
    For Each link In objH3
        If link.className = "r" Then
            i = i + 1
            On Error Resume Next
            ActiveSheet.Cells(i, 2) = link.getElementsByTagName("a")(0).href
            ActiveSheet.Cells(i, 3) = link.getElementsByTagName("h3")(0).innerText
            On Error GoTo 0
        End If
    Next
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • It works with Early Binding but fails with LateBinding? Also for `"https://www.google.co.uk/search?q=sid"` it fails in Early Binding as well... – Siddharth Rout Jan 16 '19 at 12:24
  • @SiddharthRout Everyone will have access to html object library so shouldn't be a huge problem. I guess some difficult IT departments might forbid adding but Microsoft paid a lot of money to ensure all installs had this library. For the q=sid I would imagine for telephone numbers you don't use this and instead use the telephone number concantenated in as shown in my url example. I will have a quick look at late binding but I know it doesn't expose querySelector as a method via the interface. – QHarr Jan 16 '19 at 12:27
  • @SiddharthRout Added a late bound version. Not sure where q=sid comes from as OP is using q=telNumber. – QHarr Jan 16 '19 at 12:30
  • OP mentiond `with text its fine,....` So I am assuming that he might want to use it for both... Also ran you LB version for numbers.. Getting a permission denied (Error 70) error... Let me delve into the code later tonight. Thanks for looking into it :) – Siddharth Rout Jan 16 '19 at 14:31
  • @SiddharthRout Good call: Code works for me both number and text. With respect to permission denied - which line are you getting this on please? I ran the above without problem. – QHarr Jan 16 '19 at 14:33
  • 1
    `.send`. Let me change my ISP for a moment and check with that as well. – Siddharth Rout Jan 16 '19 at 14:36
  • ++ AHA!!! My ISP was the culprit. Your code works for both numbers and text – Siddharth Rout Jan 16 '19 at 14:38
  • Let's see if the OP decides to feedback. In the meantime @SiddharthRout have you read anything on the insides of querySelector method of htmldocument anywhere? I am trying to understand whether a cssdom is created to facilitate querySelector method. – QHarr Jan 16 '19 at 14:40
  • Actually no. I have seen you using `querySelectorAll` to reply to many posts recently. I was intrigued as to how it works and hence my interest in this post :) – Siddharth Rout Jan 16 '19 at 14:42
  • @SiddharthRout This is my concern though.... I understand it from a css point of view but finding anything "internals", from vba implementation point of view, has proven difficult. I am starting to question the certainty of some statements I have made when it comes to applying css selectors via htmldocument as opposed to a browser. I guess I can race the horses but am really interested in understanding the vba implementation details in depth. – QHarr Jan 16 '19 at 14:47
  • Let me know if you find anything. I would definitely be interested in learning that :) BTW, I was earlier trying to go through [This](https://www.w3schools.com/jsref/met_document_queryselector.asp) Let's see where does it take me – Siddharth Rout Jan 16 '19 at 14:52
  • And [This](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector#A_more_complex_selector) – Siddharth Rout Jan 16 '19 at 14:53
  • 1
    It is my current research topic (only vaguely started) so yeah will defo let you know if I can arrive at least at some theory I can posit for critique. – QHarr Jan 16 '19 at 14:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186796/discussion-between-siddharth-rout-and-qharr). – Siddharth Rout Jan 16 '19 at 14:54