9

I have been attempting to use IE automation to google search a string of text in Excel. I want to return the hyperlink for the website of the first result in another cell in excel. Is this possible? I have a list of 60,000 records that I need to google search and return the hyperlink for the website in the first result. Is there another approach to this that you would reccomend? I appreciate the help in advance.

SierraOscar
  • 17,507
  • 6
  • 40
  • 68
Collin Hendo
  • 93
  • 1
  • 1
  • 3

2 Answers2

19

As its 60,000 records i recommend use xmlHTTP object instead of using IE.
HTTP requests a easier, and a lot faster

Download the sample file

Sub XMLHTTP()

    Dim url As String, lastRow As Long, i As Long
    Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
    Dim start_time As Date
    Dim end_time As Date

    lastRow = Range("A" & Rows.Count).End(xlUp).Row

    Dim cookie As String
    Dim result_cookie As String

    start_time = Time
    Debug.Print "start_time:" & start_time

    For i = 2 To lastRow

        url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
        XMLHTTP.Open "GET", url, False
        XMLHTTP.setRequestHeader "Content-Type", "text/xml"
        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
        XMLHTTP.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = XMLHTTP.ResponseText
        Set objResultDiv = html.getelementbyid("rso")

        Set objH3 = objResultDiv.getelementsbytagname("h3")


        For Each link In objH3

            If link.className = "r" Then

                Cells(i, 2) = link.innerText
                Cells(i, 3) = link.getelementsbytagname("a")(0).href
                DoEvents
            End If
        Next
    Next

    end_time = Time
    Debug.Print "end_time:" & end_time

    Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time) & " :minutes"
    MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub

Using CSS3 Selector

 Sub XMLHTTP1()

        Dim url As String, i As Long, lastRow As Long
        Dim XMLHTTP As Object, html As New HTMLDocument, objResultDiv As HTMLAnchorElement


        lastRow = Range("A" & Rows.Count).End(xlUp).Row
        For i = 2 To lastRow

            url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

            Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
            XMLHTTP.Open "GET", url, False
            XMLHTTP.setRequestHeader "Content-Type", "text/xml"
            XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
            XMLHTTP.send

            Set html = New HTMLDocument
            html.body.innerHTML = XMLHTTP.ResponseText
            Set objResultDiv = html.querySelector("div#rso h3.r a")

            Cells(i, 2) = objResultDiv.innerText
            Cells(i, 3) = objResultDiv.href

            DoEvents
        Next

    End Sub

Output

enter image description here

HTH
Santosh

Santosh
  • 12,175
  • 4
  • 41
  • 72
  • 1
    Thanks! That works well except after about 100 records I recieve a Run Error 80070005 Access is denied. Any clues on why? – Collin Hendo Jul 08 '13 at 21:38
  • 1
    @CollinHendo Nope, will have to see your data for that. Alternatively you can add 'On error resume next' on top of code. If the solution was helpful pls vote. – Santosh Jul 09 '13 at 00:47
  • I could provide you a sample of my data. How could I do that? – Collin Hendo Jul 09 '13 at 12:52
  • Actually the error occurs even if I attempt to use the code on the sample that you provided.Any hints? – Collin Hendo Jul 09 '13 at 13:19
  • It is a strange error. After running about 100-125 record, I get the access denied and cannot use it in any other workbook or anything. – Collin Hendo Jul 09 '13 at 13:49
  • Is it possible that my server cuts off access after pulling 100 or so records and then I am on a delay or cut off? – Collin Hendo Jul 09 '13 at 15:11
  • @CollinHendo Apologies for late response. You can share the workbook using dropbox or google drive and give me the link. I shall get back to you as feasible to me. – Santosh Jul 09 '13 at 18:10
  • https://www.dropbox.com/s/o1rj4fqht2I1ws6/Sample.xlsm Thanks! That's just a sample. As you can see, it ended after about 110 records. – Collin Hendo Jul 09 '13 at 19:37
  • @CollinHendo The link is not working for me. Please make sure you share the link publicly. – Santosh Jul 10 '13 at 03:35
  • My apologies. I thought it was in my public folder. Here is the correct link https://www.dropbox.com/s/t4qh8vo4g88ulu3/Sample.xlsm – Collin Hendo Jul 10 '13 at 10:16
  • I too get this error, just about after 60 entries, any resolutions to this please. – Vasim Nov 06 '13 at 07:58
  • @Vasim I will get back to you on this by tommorow. Can you take a screenshot of issue and give the link. – Santosh Nov 06 '13 at 09:41
  • [link](https://drive.google.com/file/d/0B9u_K1HbgiEKUk5JMlBZdWRzZFE/edit?usp=sharing), it stops at "xmlHttp.send" after sometime (5min or so) it again works - stops after some next 60 entries and so on..... – Vasim Nov 06 '13 at 10:16
  • @Vasim I was able to simulate the issue. With lack of time i will try to fix it in weekend. In the meanwhile you may try to add some delay or make async request. – Santosh Nov 08 '13 at 05:16
  • 1
    @Vasim I have updated the code. Give it a try and let me know the response. I tested for about 140 records which ran uninterrupted. Alternatively you may download the sample file from [here](http://goo.gl/VudD5z) – Santosh Nov 14 '13 at 18:26
  • @Santosh I was trying the get the data of a span tag with class name "_Ex", the code doesn't seem to work. I used the following: `code` Set html = CreateObject("htmlfile") html.body.innerHTML = XMLHTTP.ResponseText Set objResultDiv = html.getelementbyId("rso") Set objH3 = objResultDiv.getelementsbyTagName("span")(0) Set link = objH3.findElementsbyClassName("_Ex") `code` – Krithi07 Apr 21 '16 at 08:10
  • Great script. +1. My question is, I have thousands of records. I want to check with google. Can i try without limitation ? If i try to thousands of records, Will google think me that DDOS attack like that ? Is it legit ? – Venkat Dec 11 '16 at 14:02
0

The links seem to be consistently within within H3 tags. Normally you might use something like the following to check until the page has loaded:

Private Declare Sub Sleep Lib "kernel32" (ByVal nMilliseconds As Long)

Sub UseIE()
    Dim ie As Object
    Dim thePage As Object
    Dim strTextOfPage As String

    Set ie = CreateObject("InternetExplorer.Application")
    'ie.FullScreen = True
    With ie
        '.Visible = True
        .Navigate "http://www.bbc.co.uk"
        While Not .ReadyState = READYSTATE_COMPLETE '4
            Sleep 500      'wait 1/2 sec before trying again
        Wend
    End With

    Set thePage = ie.Document
    'more code here
End Sub

However, I would, instead, repeatedly try to reference the A element within the first H3 using getElementsByTagName("H3"), get the first of these elements, then look within this for the A-link and its href-attribute.

In JavaScript the attempts to reference non-existent elements would return undefined but from VBA it will probably need error-handling code.

Once I had obtained the href I would stop the navigation (not sure of the command for this, probably ie.Stop) or navigate to the next page immediately.

The first link(s) will, however, often be sponsored links and the href returned is a little garbled. The text of these sponsored links appear to include em tags. I might use this information to discard these links and look further down the page.

I don't know if there is a better way to do this.

Andy G
  • 19,232
  • 5
  • 47
  • 69