1

I am trying to extract prices from this HTML page using the VBA code below:

Here's the HTML snippet:

<div class="box-text box-text-products">
    <div class="title-wrapper">
        <p class="category uppercase is-smaller no-text-overflow product-cat op-7">
    Xikar Lighters      
        </p>
        <p class="name product-title">
            <a href="https://www.havanahouse.co.uk/product/xikar-allume-single-jet-flame-racing-cigar-lighter-bluewhite-stripe/">Xikar Allume Single Jet Flame Racing Cigar Lighter &#8211; Blue/White Stripe</a>
        </p>
    </div>
    <div class="price-wrapper">
        <span class="price">
            <del>
                <span class="woocommerce-Price-amount amount">
                    <span class="woocommerce-Price-currencySymbol">&pound;</span>48.00
                </span>
            </del>
            <ins>
                <span class="woocommerce-Price-amount amount">
                    <span class="woocommerce-Price-currencySymbol">&pound;</span>45.00
                </span>
            </ins>
        </span>
    </div>
</div>
<!-- box-text -->undefined</div>undefined<!-- box -->undefined</div>undefined<!-- .col-inner -->undefined</div>undefined<!-- col -->

I am using the below code but I get an error:

For Each oElement In oHtml.getElementsByClassName("woocommerce-Price-amoun t amount")
    If oElement.getElementsByTagName("del") Then Exit For

    If oElement.innerText <> 0  Then
        Cells(counter, 3) = CDbl(oElement.innerText)
        counter = counter + 1
    End If
Next oElement
  • 1
    Please include the error message in your post. – mypetlion Mar 01 '18 at 23:30
  • @Suren Grigoryan, So you are such a guy who only wishes to receive answers against your questions but not to upvote them or accept as a solution. Just take a look at your previous threads: you did nothing even when you have got qualified answers there. However, I've taken out my solution. Thanks. – SIM Mar 02 '18 at 10:44
  • @SIM - Hi, I only had a chance to look at this properly now, but what you have offered have disappeared. Not sure what do I need to do to 'upvote'? the answers – Suren Grigoryan Mar 02 '18 at 21:18
  • @SIM - thanks for you answer anyway and happy to upvote or accept a solutions if you point me towards right direction on how to do that. – Suren Grigoryan Mar 02 '18 at 21:47

1 Answers1

1

Take a look at the below example:

Option Explicit

Sub Test()

    Dim sUrl As String
    Dim oWS As Worksheet
    Dim i As Long
    Dim sResp As String
    Dim sCont As String
    Dim oMatch

    sUrl = "https://www.havanahouse.co.uk/?post_type=product"
    Set oWS = ThisWorkbook.Sheets(1)
    oWS.Cells.Delete
    i = 1
    Do
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", sUrl, False
            .send
            sResp = .ResponseText
        End With
        With CreateObject("VBScript.RegExp")
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            .Pattern = "<div class=""shop-container"">([\s\S]*?)<div class=""container"">"
            With .Execute(sResp)
                If .Count = 0 Then Exit Do
                sCont = .Item(0).SubMatches(0)
            End With
            .Pattern = "<div class=""title-wrapper"">([\s\S]*?)</div><div class=""price-wrapper"">([\s\S]*?)</div>"
            For Each oMatch In .Execute(sCont)
                oWS.Cells(i, 1) = GetInnerText(oMatch.SubMatches(0))
                oWS.Cells(i, 2) = GetInnerText(oMatch.SubMatches(1))
                oWS.Columns.AutoFit
                i = i + 1
                DoEvents
            Next
            oWS.Cells(i, 1).Select
            .Pattern = "<a class=""next page-number""[\s\S]*?href=""([^""]*)"""
            With .Execute(sResp)
                If .Count = 0 Then Exit Do
                sUrl = .Item(0).SubMatches(0)
            End With
        End With
    Loop

End Sub

Function GetInnerText(sText As String) As String

    Static oHtmlfile As Object
    Static oDiv As Object

    If oHtmlfile Is Nothing Then
        Set oHtmlfile = CreateObject("htmlfile")
        oHtmlfile.Open
        Set oDiv = oHtmlfile.createElement("div")
    End If
    oDiv.innerHTML = sText
    GetInnerText = oDiv.innerText

End Function

The output for me is as follows:

output

Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx. About RegEx: introduction (especially syntax), introduction JS, VB flavor.

BTW there are another answers using the similar approach: 1, 2, 3, 4, 5.

omegastripes
  • 12,351
  • 4
  • 45
  • 96