1

In my day to day tasks I currently have to search a large number of products and gather information on these products. So my idea is to search the product on google and get the info from the first search result by extracting the data from the product title section and pretty much loop this for a number of products.

Here is my code below so far:

Sub SkuAutomation()

Dim ie As Object


'Navigates to google
 Set ie = CreateObject("InternetExplorer.application")
 ie.Visible = True
 ie.Navigate "https://google.co.uk/search?q=" & Worksheets("sheet1").Cell(9, 4).Value & " " & Worksheets("sheet1").Cells(9, 2)


'Waits for page to load before next action

Do While ie.ReadyState <> READYSTATE_COMPLETE

Loop

End Sub

I just want to add a piece of code which either clicks on the first link that google returns or returns the link for me. My idea would then to be scrape the data from the product title section from that page! still very early stages though.

I am just a beginner so any type of help would be much appreciated! Many thanks in advance.

  • Example search term please? And google search results aren't going to be just products which you can nicely scrape the invidividual pages of in a consistent fashion (though you can scrap the initial results list) . It will be a mismash of results most likely. – QHarr Nov 01 '18 at 11:24
  • thanks for your reply.. so the way I will be searching will ensure the results are products so for example I will be searching by product code and retailer , so an example search term would be " Currys 241825 " – AspiringCoder93 Nov 01 '18 at 11:43
  • And you want just the first product link? – QHarr Nov 01 '18 at 11:54
  • Thanks a lot for the help. so I'm at very early stages of what I want to do with this but for some context I get a file with product codes and I found that if I search the product code with the corresponding retailer, about 90% of the time the first link is the exact product im looking for from the exact retailer site. So my idea is to automatically search the products click the first link and scrape the product title section ( I get the products from a set number of retailers so will be from the same sites). Idk if this is possible but really want to automate this process. – AspiringCoder93 Nov 01 '18 at 12:09

2 Answers2

6

Your mileage will likely vary on this but for what you have provided you can use a CSS selector combination to target the first link by the page styling.

I use #search div.r [href*=http] but you could simplify to #search .r a. I am interested in knowing there is an http in the href though.

The # is an id selector, a space " " is a descendant selector (selects a child of the preceeding element and the [] is an attribute selector. A "." is a class selector i.e. selects an element by class name.

I am looking for the first element with an href attribute containing http in its value that has a parent element div element with class name r, whose parent has an id of search.

Option Explicit
Public Sub GetLink()
    Dim ie As New InternetExplorer
    With ie
        .Visible = True
        .navigate "https://google.co.uk/search?q=Currys+241825"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Debug.Print .document.querySelector("#search div.r [href*=http]").href

        .Quit
    End With

End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
1

This is how I would do it. Put some search criteria in Cell A2, going down in ColumnA as far as you want to go. Then run the code below. The results will go into the adjacent cells in ColumnB

enter image description here

girafe
rhino
starbucks


Sub Gethits()
    Dim url As String, lastRow As Long
    Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
    Dim start_time As Date
    Dim end_time As Date
    Dim var As String
    Dim var1 As Object

    lastRow = Range("A" & Rows.Count).End(xlUp).Row

    Dim cookie As String
    Dim result_cookie As String

    start_time = Time
    Debug.Print "start_time:" & start_time

    For i = 2 To lastRow

        url = "https://www.google.com/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
        XMLHTTP.Open "GET", url, False
        XMLHTTP.setRequestHeader "Content-Type", "text/xml"
        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
        XMLHTTP.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = XMLHTTP.ResponseText
        Set objResultDiv = html.getelementbyid("rso")
        Set var1 = html.getelementbyid("resultStats")
        Cells(i, 2).Value = var1.innerText

        DoEvents
    Next

    end_time = Time
    Debug.Print "end_time:" & end_time

    Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
    MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
ASH
  • 20,759
  • 19
  • 87
  • 200