1

I'm trying to obtain the result number (in the HTML code) of each keyword I search by means of Excel VBA. Narrowing down the criteria by className, id, and data-asin, but that last one is proving to be tricky since VBA doesn't support that reference library yet.

What this code is structured to do:

  1. Go onto amazon.com and go to the search bar.

  2. The loop starts with column C. Pull a search term from the SearchTerm1 column and search.

  3. Once the results page is loaded, try and find the specified product by className, ID (both found in the HTML code), and ASIN number (this number is pulled from column B in order to match the data-asin value on the search results page). Without all 3 criteria, excel won't be able to find the product if it's listed on the first results page.

  4. If the product is found on the first page, grab its result position (eg. "result_0" for the first result and result_1 for the second) and puts it in column D (SRank1).
  5. If the product is not found, the loop keeps going anyways until the search term column is empty.
  6. If the product is not found on the first page, continue advancing pages and searching for the specified product in order to grab the "search rank" of the item.
  7. The same steps are repeated for columns SearchTerm2, 3, and 4.

The screenshot below is the code only pulling based on the className and ID criteria, and it pulls the last product result from the page, which is not what my goal for analysation of how products are doing.

The code only pulls the last "search rank" from the page of results, not the result position of the actual product.

The code included only pulls the product rank if the product is the first result on the search page, which means something is finally working, but is missing a step or two to grab all of the product positions from the page. enter image description here

enter image description here

Any help or push in the right direction would be highly appreciated. I wish VBA were more versatile for these kinds of sales research things. It's done wonders so far, but I may be reaching its limit. Code is below.

Sub AmazonSearchRank()

    Dim MyHTML_Element As IHTMLElement
    Dim MyURL As String

    Dim AASearchRank As Workbook
    Dim AAws As Worksheet
    Dim InputSearchOrder As HTMLInputElement
    Dim elems As IHTMLElementCollection
    Dim TDelement As HTMLTableCell

    Dim InputSearchButton As HTMLInputButtonElement
    Dim IE As InternetExplorer
    Dim AASearchTerms As Workbook
    Dim SearchTermsSheet As Worksheet

    Dim x As Integer
    Dim i As Long

    MyURL = "https://www.amazon.com"
    Set IE = New InternetExplorer
    With IE
        .Silent = True
        .Navigate MyURL
        .Visible = True
        Do
            DoEvents
        Loop Until .ReadyState = READYSTATE_COMPLETE
    End With
    Set HTMLDoc = IE.Document

    Set AASearchRank = Application.Workbooks.Open("C:\Users\CompanyName\Desktop\Automation Anywhere\Sample_Items_For_SearchRank.xls")
    Set AAws = AASearchRank.Worksheets("Sheet1")

    Set InputSearchButton = HTMLDoc.getElementById("nav-search-submit-text")
    Set InputSearchOrder = HTMLDoc.getElementById("twotabsearchbox")
    If Not InputSearchOrder Is Nothing Then
        InputSearchButton.Click
        Do
            DoEvents
        Loop Until IE.ReadyState = READYSTATE_COMPLETE
    End If

    x = 2
    If AAws.Range("D" & x).Value = "" Then
        Do Until AAws.Range("B" & x) = ""
            Set InputSearchOrder = HTMLDoc.getElementById("twotabsearchtextbox")
            InputSearchOrder.Value = AAws.Range("C" & x)

            Set InputSearchButton = HTMLDoc.getElementsByClassName("nav-input")(0)
            InputSearchButton.Click
              Do
                DoEvents
            Loop Until IE.ReadyState = READYSTATE_COMPLETE
            Application.Wait (Now + TimeValue("0:00:05"))

            Set elems = HTMLDoc.getElementsByClassName("s-result-item celwidget")
            i = 2
            For Each TDelement In elems
                If TDelement.className = "s-result-item celwidget" And InStr(TDelement.ID, "result") InStr(TDelement.innerHTML, AAws.Range("B" & x).Value) Then
                    AAws.Range("D" & x).Value = TDelement.ID
                    i = i + 1
                End If
            Next
        x = x + 1
        Loop
    End If

End Sub
omegastripes
  • 12,351
  • 4
  • 45
  • 96
Tak
  • 71
  • 3
  • 12
  • Could you please slightly rework your question to make it more clear. Add what your code does, and what it does wrong, and also add description for the worksheet screenshot, and purpose of each column on it (what is given, and what is to be found). – omegastripes Dec 30 '15 at 19:29
  • Hey @omegastripes, I edited quite a few things in the questions. Please let me know if anything else needs clarification. I was pretty close on solving the puzzle, but I'm not sure what I'm missing. – Tak Dec 31 '15 at 13:23

2 Answers2

3

Here is the example which downloads products from Amazon for each search query presented on the sheet Terms, and populates the sheet Products with ASINs and descriptions. It uses XHR, so IE isn't needed. The code is as follows:

Sub Test()
    lngRow = 1
    ' search each term
    For Each strTerm In Sheets("Terms").UsedRange
        lngPage = 1
        Do
            ' HTTP GET request of the search result page
            strUrl = "https://www.amazon.com/s/ref=nb_sb_noss_2?page=" & lngPage & "&keywords=" & EncodeUriComponent(strTerm)
            Set objXHR = CreateObject("MSXML2.XMLHttp")
            objXHR.Open "GET", strUrl, False
            objXHR.Send
            strResp = objXHR.ResponseText
            ' split response to array by items
            arrResp = Split(strResp, "<li id=""result_")
            ' process each item on the page
            For i = 1 To UBound(arrResp)
                strItem = arrResp(i)
                ' extract ASIN
                strTmp = Split(strItem, "data-asin=""")(1)
                strTmp = Split(strTmp, """")(0)
                Sheets("Products").Cells(lngRow, 1).NumberFormat = "@"
                Sheets("Products").Cells(lngRow, 1).Value = strTmp
                ' extract the product description
                strTmp = Split("<li id=""result_" & strItem, "</li>")(0) & "</li>"
                Sheets("Products").Cells(lngRow, 2).Value = GetInnerText(strTmp)
                ' show current item
                Sheets("Products").Cells(lngRow, 1).Select
                ' next row
                lngRow = lngRow + 1
            Next
            ' adjust sheet
            Sheets("Products").Columns.AutoFit
            Sheets("Products").Rows.AutoFit
            ' next page
            lngPage = lngPage + 1
        Loop Until UBound(arrResp) = 0 ' empty search result
    Next
End Sub

Function EncodeUriComponent(strText)
    Static objHtmlfile As Object
    If objHtmlfile Is Nothing Then
        Set objHtmlfile = CreateObject("htmlfile")
        objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript"
    End If
    EncodeUriComponent = objHtmlfile.parentWindow.encode(strText)
End Function

Function GetInnerText(strHtmlContent)
    Dim objHtmlFile, objBody
    Set objHtmlFile = CreateObject("htmlfile")
    objHtmlFile.write strHtmlContent
    Set objBody = objHtmlFile.getElementsByTagName("body")(0)
    GetInnerText = Trim(objBody.innerText)
End Function

I placed on the Terms sheet:

terms

Results on the Product sheet contain 571 items:

products

It's not a complete answer, but I hope it helps you.

omegastripes
  • 12,351
  • 4
  • 45
  • 96
  • It definitely helps, and it's a great solution, but VBA is what I'm looking for. I tried it out and it works perfectly. I'll just have to tweak it some so that the code will exclude all that excess data in the second column. Definitely gave me some direction in the code - thank you, kind user. – Tak Dec 29 '15 at 12:43
  • Would you be able to assist me in trying to just using XML to reference the ASIN code? I'm doing a small project on trying to obtain the position number (result number) of each product after the search page has loaded. I've attached a screenshot of the spreadsheet for a list of product examples I've pulled from one seller. Also, a few notes would be very helpful - I have almost no experience with XML. – Tak Dec 30 '15 at 14:15
  • Thank you! I'm also working on trying to tweak the code. I'm just having to re-learn the syntax for all of this. Loops are going to be a bit of a challenge, haha. – Tak Dec 30 '15 at 14:58
  • I got the bloody thing to work. It was just one snippet of code that I had to remove, and guess which one it was? The TDelement.ID. All I need to do now is to insert a function which allows the page to advance until Excel is able to find the ASIN number and afterwards move onto the next search term. – Tak Dec 31 '15 at 18:58
0

Through trial and error, I finally solved this bloody thing. I just had to take out part of the code which included the "And InStr(TDelement.ID, "result")" and then everything ran smooth as butter.

Tak
  • 71
  • 3
  • 12