3

I have to open the google search page using excel Macro. I am able to successfully open the google search page, after I give my search parameters in excel. However, my task is to open the first returned search result page and do some data extraction in that page. I used the below code.

Suppose if I searched for "Sachin Tendulkar wiki", I should be able to open the first page in the search result. My code so far is as below.

Dim ie As InternetExplorer
Dim RegEx As RegExp, RegMatch As MatchCollection
Dim MyStr As String
Dim pDisp As Object
Set ie = New InternetExplorer
Set RegEx = New RegExp
Dim iedoc As Object

'Search google for "something"
ie.Navigate "http://www.google.com.au/search?hl=en&q=sachin+tendulkar+wiki&meta="

'Loop unitl ie page is fully loaded
Do Until ie.ReadyState = READYSTATE_COMPLETE
Loop



MyStr = ie.Document.body.innertext
Set RegMatch = RegEx.Execute(MyStr)

'If a match to our RegExp searchstring is found then launch this page
If RegMatch.Count > 0 Then
    ie.Navigate RegMatch(0)
    Do Until ie.ReadyState = READYSTATE_COMPLETE
    Loop
         MsgBox "Loaded"
         'show internet explorer
    ie.Visible = True
    'Private Sub ie_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    Set iedoc = ie.Application.Document
    'iedoc.getElementById("divid").Value = "poS0"
    'MsgBox iedoc

    'ie.Navigate iedoc.getelementsbytagname("ol")(0).Children(0).getelementsbytagname("a")(0).href
    ie.Navigate iedoc.getelementsbyclassname("divid")("poS0").href
    Else
    MsgBox "No linkedin profile found"
End If

Set RegEx = Nothing
Set ie = Nothing

I viewed the page source in the google search page. I have a particular div id = "pos0" which is the id for the first search result. I have to make the IE navigate to the page whose div id = "pos0". I am not able to accomplish this thing in VBA. Can some one please help me out?

Thanks & Regards, Ramesh

SierraOscar
  • 17,507
  • 6
  • 40
  • 68
Ramesh
  • 765
  • 7
  • 24
  • 52
  • If you run the search in Google Chrome and inspect element, you can navigate the HTML on the page all the way down to where the URL for the first search result is. Then you can look at what all the surrounding elements are and decide on what would be the best way to navigate to it. Then, feed the URL to ie.Navigate. Good luck :) – Patashu Feb 06 '13 at 22:04
  • Hi Patashu, Thanks for replying. The thing is I won't be knowing the first search result URL that my search parameter is going to return. I know for sure the URL of the first search result has div id = "pos0" in it. I want to navigate to the URL corresponding to this section using VBA. Please let me know if you need more information. – Ramesh Feb 06 '13 at 22:40
  • You don't need to know the URL, you can inspect the page and see that the surrounding elements of the first URL are always in the same order. So in your VBA code you can navigate the DOM, go down to that element, discover what the url is and feed that to ie.Navigate. Hope this helps – Patashu Feb 06 '13 at 22:41
  • Hi Patashu, Once again I appreciate your help. Can you please tell me how to navigate to the DOM? I mean in my code I tried something like this. iedoc.getElementById("divid").Value = "poS0" But it was not working. – Ramesh Feb 06 '13 at 22:48
  • Hi, you can read about DOM here: http://www.w3schools.com/htmldom/default.asp – Patashu Feb 06 '13 at 22:51

2 Answers2

4

You have a couple of issues. First to access the document object its ie.Document not ie.Application.Document. I have updated your code to show how the first url can quickly be found using a substring.

Dim ie As InternetExplorer
Dim RegEx As RegExp, RegMatch As MatchCollection
Dim MyStr As String
Dim pDisp As Object
Set ie = New InternetExplorer
Set RegEx = New RegExp
Dim iedoc As Object

'Search google for "something"
ie.Navigate "http://www.google.com.au/search?hl=en&q=sachin+tendulkar+wiki&meta="

'Loop unitl ie page is fully loaded
Do Until ie.ReadyState = READYSTATE_COMPLETE
Loop



MyStr = ie.Document.body.innertext
Set RegMatch = RegEx.Execute(MyStr)

'If a match to our RegExp searchstring is found then launch this page
If RegMatch.Count > 0 Then
    ie.Navigate RegMatch(0)
    Do Until ie.ReadyState = READYSTATE_COMPLETE
    Loop
         MsgBox "Loaded"
         'show internet explorer
    ie.Visible = True
    'Private Sub ie_DocumentComplete(ByVal pDisp As Object, URL As Variant)
    '****************************************
    'EDITS
    '****************************************
    Set iedoc = ie.Document

    'create a variable to hold the text
    Dim extractedHTML As String
    'start and end points for the substring
    Dim iStart, iEnd As Integer
    'get the element with ID of search - this is where the results start
    extractedHTML = iedoc.getElementById("search").innerHTML
    'find the first href as this will be the first link, add 1 to encompass the quote
    iStart = InStr(1, extractedHTML, "href=", vbTextCompare) + Len("href=") + 1
    'locate the next quote as this will be the end of the href
    iEnd = InStr(iStart, extractedHTML, Chr(34), vbTextCompare)
    'extract the text
    extractedHTML = Mid(extractedHTML, iStart, iEnd - iStart)
    'go to the URL
    ie.Navigate extractedHTML

    '****************************************
    'End EDITS
    '****************************************
    Else
    MsgBox "No linkedin profile found"
End If

Set RegEx = Nothing
Set ie = Nothing
Sorceri
  • 7,870
  • 1
  • 29
  • 38
2

You may consider using xmlHTTP object instead of using IE.
HTTP requests a easier, and a lot faster

Below is sample code

Sub xmlHttp()

    Dim URl As String, lastRow As Long
    Dim xmlHttp As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object


    lastRow = Range("A" & Rows.Count).End(xlUp).Row

    For i = 2 To lastRow

        URl = "https://www.google.co.in/search?q=" & Cells(i, 1)

        Set xmlHttp = CreateObject("MSXML2.XMLHTTP")
        xmlHttp.Open "GET", URl, False
        xmlHttp.setRequestHeader "Content-Type", "text/xml"
        xmlHttp.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = xmlHttp.ResponseText
        Set objResultDiv = html.getelementbyid("rso")
        Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
        Set link = objH3.getelementsbytagname("a")(0)


        str_text = Replace(link.innerHTML, "<EM>", "")
        str_text = Replace(str_text, "</EM>", "")

        Cells(i, 2) = str_text
        Cells(i, 3) = link.href
    Next
End Sub

enter image description here

HTH
Santosh

Santosh
  • 12,175
  • 4
  • 41
  • 72