0

I am trying to fix a calculator for excel which extracts the most recent international surcharge rate for TNT from this website: https://www.tnt.com/express/en_nz/site/shipping-services/fuel-surcharges-apac.html As you can see the 15.75% is the latest surcharge rate.

The screenshot I have uploaded is the particular p tag where I want to extract the content inside "15.75%". Webpage screenshot

I have the following VBA code to test I am getting the correct elements:

Sub GetFuelSurchargeWeb()

    Dim xhr As Object
    Dim doc As MSHTML.HTMLDocument
    Dim table As Object
    Dim tableCell As HTMLHtmlElement
    Dim valCharge As String, url As String, inrText As String, searchTag1 As String, searchTag2 As String, valFrom As String
    Dim i As Integer, tag1Indx As Integer, tag2Indx As Integer, tag3Indx As Integer
    Dim searchTag3 As String
    Dim ObjP As Object

    url = "https://www.tnt.com/express/en_nz/site/shipping-services/fuel-surcharges-apac.html"
    searchTag1 = "FROM"
    searchTag2 = ":"
    searchTag3 = ":"

    On Error GoTo ErrHndlr
    Application.ScreenUpdating = False

    Set xhr = CreateObject("MSXML2.XMLHTTP")

    With xhr
        .Open "GET", url, False
        .send
        If .readyState = 4 And .status = 200 Then
            Set doc = New MSHTML.HTMLDocument
            doc.body.innerHTML = .responseText
        Else
            MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
            vbNewLine & "HTTP request status: " & .status
        End If
    End With

    Set ObjP = doc.querySelectorAll("p")
    Debug.Print (ObjP.Length)
    For Each table In ObjP
        Debug.Print (table.innerHTML)
    Next table

When I am printing the innerHTML of the p tag elements, it seems to grab the likes of the first paragraph, "Week" "Dollar per Gallon" "All services" but then skips the likes of "23 Sep 2019 - 29 Sep 2019" "1.833" "15.75%", even though they are all contained in p tags.

I have only just begun using VBA, and am confused as to how I can get this value. Would appreciate if anyone can help me out with a solution or an alternative to get the values I want. Ideally I am wanting the element containing the current week also "23 Sep 2019 - 29 Sep 2019" but am only concerned with the surcharge rate for now.

Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
Yoshi
  • 15
  • 4
  • 1
    XMLHTTP doesn't load or execute dynamic content - i.e.anything added after page load. You will need to use a different method (such as automating IE) if you want to scrape this content. – Tim Williams Sep 27 '19 at 03:19
  • Ok, would you mind showing me an example of web scraping for this type? – Yoshi Sep 27 '19 at 03:29

1 Answers1

1

That content is dynamically retrieved from another endpoint which you are not capturing. You can find it in the network tab. It returns json so ideally you would use a json parser like jsonconverter.bas to handle the response and extract the values of interest. The endpoint in question is https://www.tnt.com/express/getDynamicData.apac.json

An example extracting latest data point from json

Option Explicit

Public Sub GetData()
    Dim json As Object

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.tnt.com/express/getDynamicData.apac.json", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        Set json = JsonConverter.ParseJson(.responsetext)("list")(1)
        Debug.Print json("week"), json("weeklyPrice"), json("surcharge")
    End With
End Sub

json library:

I use jsonconverter.bas. Download raw code from here and add to standard module called jsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime. Remove the top Attribute line from the copied code.

Explore json here: https://jsoneditoronline.org/?id=7266ab97d0ac463cb934083fc549038b

QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Hello, thank you for the help on this problem! The problem was solved, but I am just wondering how you were able to find a link for the JSON endpoint of the webpage https://www.tnt.com/express/getDynamicData.apac.json. I am currently doing something similar for https://www.fedex.com/en-nz/shipping/surcharges.html and trying to do the same. Regards – Yoshi Nov 03 '19 at 21:46
  • I searched the network tab for the xhr feeding content to the url you were looking at. See [1](https://stackoverflow.com/a/56279841/6241235) and [2](https://stackoverflow.com/a/56924071/6241235) – QHarr Nov 03 '19 at 21:48
  • I see, thank you. I cannot seem to find it for fedex.com/en-nz/shipping/surcharges.html is the content not dynamically retrieved? – Yoshi Nov 04 '19 at 21:32
  • Don't know. Post a new question with your current code and explain what you are after? Will take a look. – QHarr Nov 04 '19 at 21:33