Dual class in html scraping vba

Question

I am trying to extract prices from this HTML page using the VBA code below:

Here's the HTML snippet:

<div class="box-text box-text-products">
    <div class="title-wrapper">
        <p class="category uppercase is-smaller no-text-overflow product-cat op-7">
    Xikar Lighters      
        </p>
        <p class="name product-title">
            <a href="https://www.havanahouse.co.uk/product/xikar-allume-single-jet-flame-racing-cigar-lighter-bluewhite-stripe/">Xikar Allume Single Jet Flame Racing Cigar Lighter &#8211; Blue/White Stripe</a>
        </p>
    </div>
    <div class="price-wrapper">
        <span class="price">
            <del>
                <span class="woocommerce-Price-amount amount">
                    <span class="woocommerce-Price-currencySymbol">&pound;</span>48.00
                </span>
            </del>
            <ins>
                <span class="woocommerce-Price-amount amount">
                    <span class="woocommerce-Price-currencySymbol">&pound;</span>45.00
                </span>
            </ins>
        </span>
    </div>
</div>
<!-- box-text -->undefined</div>undefined<!-- box -->undefined</div>undefined<!-- .col-inner -->undefined</div>undefined<!-- col -->

I am using the below code but I get an error:

For Each oElement In oHtml.getElementsByClassName("woocommerce-Price-amoun t amount")
    If oElement.getElementsByTagName("del") Then Exit For

    If oElement.innerText <> 0  Then
        Cells(counter, 3) = CDbl(oElement.innerText)
        counter = counter + 1
    End If
Next oElement

@Suren Grigoryan, So you are such a guy who only wishes to receive answers against your questions but not to upvote them or accept as a solution. Just take a look at your previous threads: you did nothing even when you have got qualified answers there. However, I've taken out my solution. Thanks. — SIM, Mar 02 '18 at 10:44
@SIM - Hi, I only had a chance to look at this properly now, but what you have offered have disappeared. Not sure what do I need to do to 'upvote'? the answers — Suren Grigoryan, Mar 02 '18 at 21:18
@SIM - thanks for you answer anyway and happy to upvote or accept a solutions if you point me towards right direction on how to do that. — Suren Grigoryan, Mar 02 '18 at 21:47

omegastripes · Accepted Answer · 2018-03-12T17:33:39.650

Take a look at the below example:

Option Explicit

Sub Test()

    Dim sUrl As String
    Dim oWS As Worksheet
    Dim i As Long
    Dim sResp As String
    Dim sCont As String
    Dim oMatch

    sUrl = "https://www.havanahouse.co.uk/?post_type=product"
    Set oWS = ThisWorkbook.Sheets(1)
    oWS.Cells.Delete
    i = 1
    Do
        With CreateObject("MSXML2.XMLHTTP")
            .Open "GET", sUrl, False
            .send
            sResp = .ResponseText
        End With
        With CreateObject("VBScript.RegExp")
            .Global = True
            .MultiLine = True
            .IgnoreCase = True
            .Pattern = "<div class=""shop-container"">([\s\S]*?)<div class=""container"">"
            With .Execute(sResp)
                If .Count = 0 Then Exit Do
                sCont = .Item(0).SubMatches(0)
            End With
            .Pattern = "<div class=""title-wrapper"">([\s\S]*?)</div><div class=""price-wrapper"">([\s\S]*?)</div>"
            For Each oMatch In .Execute(sCont)
                oWS.Cells(i, 1) = GetInnerText(oMatch.SubMatches(0))
                oWS.Cells(i, 2) = GetInnerText(oMatch.SubMatches(1))
                oWS.Columns.AutoFit
                i = i + 1
                DoEvents
            Next
            oWS.Cells(i, 1).Select
            .Pattern = "<a class=""next page-number""[\s\S]*?href=""([^""]*)"""
            With .Execute(sResp)
                If .Count = 0 Then Exit Do
                sUrl = .Item(0).SubMatches(0)
            End With
        End With
    Loop

End Sub

Function GetInnerText(sText As String) As String

    Static oHtmlfile As Object
    Static oDiv As Object

    If oHtmlfile Is Nothing Then
        Set oHtmlfile = CreateObject("htmlfile")
        oHtmlfile.Open
        Set oDiv = oHtmlfile.createElement("div")
    End If
    oDiv.innerHTML = sText
    GetInnerText = oDiv.innerText

End Function

The output for me is as follows:

Generally RegEx's aren't recommended for HTML parsing, so there is disclaimer. Data being processed in this case is quite simple that is why it is parsed with RegEx. About RegEx: introduction (especially syntax), introduction JS, VB flavor.

BTW there are another answers using the similar approach: 1, 2, 3, 4, 5.

@SurenGrigoryan If you find the answer helpful, please click to accept it. — omegastripes, Mar 04 '18 at 22:09

Dual class in html scraping vba

1 Answers1

Linked