0

Hi I am trying to scrape the Product name (Cohiba Robusto), Product Size (Single Cigar, Pack of 3, Box of 25) and prices (£33.65, £90, £730) from this website: https://www.jjfox.co.uk/cohiba-robusto-621.html

I am trying to get something like this:

enter image description here

I am using the code below, which gives an error ("Object variable or with variable not set").

Will appreciate any help with this.

Sub getproducts()

Sheets("JJFox").Select

Dim oHtml       As HTMLDocument
Dim oElement    As Object

Dim Elements As IHTMLElementCollection
Dim Document As HTMLDocument

Set oHtml = New HTMLDocument


'Cells(1, 6) = Time()
lastrow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row
cnt = lastrow + 1

counter1 = cnt

Dim gg As String

gg = "https://www.jjfox.co.uk/cohiba-robusto-621.html"


Dim objHTTP As New WinHttp.WinHttpRequest
url = gg
    objHTTP.Open "POST", url, False
    objHTTP.setRequestHeader "Content-Type", "application/json"
    objHTTP.send ("{""key"":null,""from"":""me@me.com"",""to"":null,""cc"":null,""bcc"":null,""date"":null,""subject"":""My Subject"",""body"":null,""attachments"":null}")
   
 oHtml.body.innerHTML = objHTTP.responseText
 'Cells(rw, 2) = oHtml.getElementsByTagName("description").innerText

   '    If Not .Document.querySelector("button[aria-label='Close']") Is Nothing Then
     '       .Document.querySelector("button[aria-label='Close']").Click
      '  End If
      
      
    txttitle = oHtml.getElementsByClassName("productcart")(0).innerText
txttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTML


txttitle = Mid(txttitle, 1, InStr(1, txttitle, Chr(10)))
'Debug.Print txttitlehtml
'txttitle2 = oHtml.getElementsByClassName("price")(0).innerText

Dim Text As String
Text = GetHTML(gg)


starts = InStr(1, Text, "spConfig =")
endS = InStr(starts + 1, Text, "spConfig")

If starts = 0 Then


    Cells(counter1, 1) = txttitle
    Cells(counter1, 2) = "Single"
    starts = InStr(starts + 1, Text, "productPrice")
    endl = InStr(starts + 1, Text, ",")
    Cells(counter1, 3) = Val(Mid(Text, starts + 14, endl - (starts + 14)))
    Cells(counter1, 4) = "JJFox"
    Cells(counter1, 5) = Now()
     
    Cells(counter1, 7) = gg ' link to the page
    counter1 = counter1 + 1
   
Else


Text = Mid(Text, starts, endS - starts)
'Debug.Print Text
'find how many pack options are avaialble

myTxt = Text
countTxt = "label"

bb = (Len(myTxt) - Len(replace(myTxt, countTxt, ""))) / Len(countTxt) - 1
'End find////////////////////////////////////

varlabel = "class=" & Chr(34) & "label" & Chr(34)


starts = InStr(1, Text, "label") + 1
Text = Mid(Text, starts, Len(Text))
        
        For i = 1 To bb
        
        
        starts = InStr(1, Text, "label")
        
        If InStr(starts, Text, "label") Then
        
        'Show the element's properties
           
        
                Cells(counter1, 1) = txttitle
                Cells(counter1, 2) = Mid(Text, starts + 8, InStr(starts, Text, " \") - (starts + 8))
                
                                       
                        starts = InStr(starts + 1, Text, "oldPrice")
                        endl = InStr(starts + 1, Text, ",")
                        

                Cells(counter1, 3).FormulaR1C1 = Val(Mid(Text, starts + 11, endl - (starts + 11)))
                'Debug.Print Val(Mid(Text, startS + chrs, 6))
                Cells(counter1, 4) = "JJFox"
                Cells(counter1, 5) = Now()
                starts = starts + 1
                Text = Mid(Text, starts, Len(Text))
                Cells(counter1, 7) = gg ' link to the page
                counter1 = counter1 + 1
           End If
        
        Next i
            
End If
'Cells(2, 6) = Time()
End Sub



Function GetHTML(url As String) As String
     With CreateObject("MSXML2.ServerXMLHTTP.6.0")
        .Open "GET", url, False
        .send
        GetHTML = .responseText
    End With
End Function

1 Answers1

2

The prices and labels are pulled dynamically from a script tag who content you can parse as json with a json parser. You need to grab the name from the html however.

With a little knowledge of html and css, it is easy enough to define a css pattern to target the script node of interest with:

.fieldset [type='text/x-magento-init']

That looks for a child script with type attribute having attribute value text/x-magento-init, and a parent with class fieldset.

I have used a tiny bit less efficient (you won't notice):

For i = 1 To optionsCollection.Count

Simply because I know the collection is small and to allow me to index into two variables with a single loop.


Json library:

I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . Remove the top Attribute line from the copied code.

You then need to go:

VBE > Tools > References > Add references to:

Microsoft Scripting Runtime
Microsoft HTML Object Library
Microsoft XML Library. 

In VBA for json the [] denotes a collection and the {} represents a dictionary.


Option Explicit

Public Sub GetCigarData()
    '<  VBE > Tools > References:
    'Microsoft Scripting Runtime
    'Microsoft HTML Object Library
    'Microsoft XML Library
    
    Dim json As Object, html As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60, ws As Worksheet

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument
    
    With xhr
        .Open "GET", "https://www.jjfox.co.uk/cohiba-robusto-621.html", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
         html.body.innerHTML = .responseText
    End With

    Set json = jsonConverter.ParseJson(html.querySelector(".fieldset [type='text/x-magento-init']").innerHTML)("#product_addtocart_form")("configurable")("spConfig")
     
    Dim prices As Scripting.Dictionary, options As Scripting.Dictionary, optionsCollection As Collection
    
    Set prices = json("optionPrices")
    Set options = json("attributes")
    Set optionsCollection = options(options.Keys(0))("options")
    
    Dim results() As Variant, headers() As Variant, i As Long, name As String
    ReDim results(1 To optionsCollection.Count, 1 To 3)
    
    name = html.querySelector(".base").innerText

    For i = 1 To optionsCollection.Count
         results(i, 1) = name
         results(i, 2) = optionsCollection.item(i)("label")
         results(i, 3) = prices(prices.Keys(i - 1))("finalPrice")("amount")
    Next
    
    headers = Array("Name", "Size", "Price")
    
    With ws
        .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results 
    End With
   
End Sub

Read about css selectors:

  1. https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thank you very much @QHarr. Unfortunately I can't run the code as i am getting a "Variable not defined" error on jsonConverter in Set json = jsoncoverter.parseJson(...). i have added all references as advised above but it still its not working. – Suren Grigoryan Feb 11 '21 at 10:21
  • 1
    @QHarr I also tried to find a JSON with the data last night, but was unsuccessful. Thanks for pointing out this possibility. It's always great to learn new approaches. – Zwenn Feb 11 '21 at 10:39
  • 1
    @SurenGrigoryan Please read carefully what QHarr has written. He linked were you can get the jsonConverter and discribed how to use it. It's a library to parse JSON. JSON is a data exchange format like XML. For Excel, this is just a string. However, Tim Hall has developed the jsonConverter in VBA. You have to copy the code of the linked .bas file into a new module named JsonConverter. – Zwenn Feb 11 '21 at 10:45
  • 1
    thank you @Zwenn, I have missed that for some reason. now added and its working fine. – Suren Grigoryan Feb 11 '21 at 11:23
  • 1
    @SurenGrigoryan That's very good :-) You can mark the answer of QHarr as solution. Other users can have a benefit of a right answer. – Zwenn Feb 11 '21 at 13:47
  • Set optionsCollection = options(options.Keys(0))("options") has started returning Null each time, which prevents correctly re-defining the results() array. I can't see why this has change. Any hints please? – Suren Grigoryan May 02 '22 at 08:56
  • 1
    You will need to inspect the JSON. Is the first key the expected one? `json("attributes")` – QHarr May 02 '22 at 10:53
  • Thank you. @QHarr Yes, it is. it gets info from here as it used to previously, but i am not sure why is it returning blank now: " – Suren Grigoryan May 02 '22 at 13:08
  • 1
    The webpage has changed. Probably best to open a new question and indicate what you need from the page. I now see this `{ "#product_addtocart_form": { "priceOptions": { "optionConfig": {"486":{"486":{"prices":{"oldPrice":{"amount":3.99,"adjustments":[]},"basePrice":{"amount":3.324999},"finalPrice":{"amount":3.99}},"type":"fixed","name":"Gift Wrapping"}}}, "controlContainer": ".field", "priceHolderSelector": "[data-product-id='621'][data-role=priceBox]" } } }` – QHarr May 02 '22 at 14:45
  • thank you @QHarr, i have done that, here: https://stackoverflow.com/questions/72089620/json-scraping-options-vba – Suren Grigoryan May 02 '22 at 16:30
  • @QHarr, many thanks but still can't figure out what is wrong. had to revert back to my old code, which is far less elegant and efficient than what you have suggested, but I have no clue how to use it. Any further pointers on how to get this fixed will be very much appreciated. – Suren Grigoryan Jul 26 '22 at 20:37