3

I am trying to create a macro that scrapes a cargo tracking website. But I have to create 4 such macros as each airline has a different website.

I am new to VBA and web scraping.

I have put together a code that works for 1 website. But when I tried to replicate it for another one, I am stuck in the loop. I think it maybe how I am referring to the element, but like I said, I am new to VBA and have no clue about HTML.

I am trying to get the "notified" value in the highlighted line from the image.

IMAGE:"notified" text to be extracted Below is the code I have written so far that gets stuck in the loop. Any help with this would be appreciated.

Sub FlightStat_AF()

Dim url As String
Dim ie As Object
Dim nodeTable As Object

  'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
  url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("InternetExplorer.Application")
  ie.Visible = False
  ie.navigate url
  Do Until ie.readyState = 4: DoEvents: Loop
  
  'Wait to load dynamic content after IE reports it's ready
  'We can do that in a loop to match the point the information is available
  Do
    On Error Resume Next
    Set nodeTable = ie.document.getElementByClassName("block-whisper")
    On Error GoTo 0
  Loop Until Not nodeTable Is Nothing
  
  'Get the status from the table
  MsgBox Trim(nodeTable.getElementsByClassName("fs-12 body-font-bold").innerText)
  
  'Clean up
  ie.Quit
  Set ie = Nothing
  Set nodeTable = Nothing
End Sub
Achal Desai
  • 93
  • 1
  • 8
  • You are likely stuck in a loop because "block-whisper" is never found. When that is the case, your code will loop forever. The loop isn't really needed. You either find the element or you don't and you can take action either way. – Brian M Stafford Sep 04 '20 at 14:06
  • @BrianMStafford, The loop is needed because it's a dynamic table that is loaded after the source page is loaded. On the element part, I am not sure if I am referring to the correct element as I don't know anything about HTML. Can you please look at the image attached and see what mistake I am making? – Achal Desai Sep 04 '20 at 14:37
  • 1
    You try my code from this answer for another page: https://stackoverflow.com/questions/63738093/excel-vba-web-scraping-inner-text-of-html-table-cell/63740761#63740761 That can't work. The code for scraping is different for every page because every page is different. I have now no time to look how it works for this page. Maybe later. – Zwenn Sep 04 '20 at 14:42
  • Try [this link](https://www.afklcargo.com/mycargo/api/shipment/detail/057-92366691). You can use xhr to fetch what you want. – SIM Sep 04 '20 at 15:05
  • @SIM, Thank you for this. But I am absolutely new to web scraping and only have basic VBA knowledge. Would you be able help me with the code a little as well? – Achal Desai Sep 04 '20 at 16:08
  • if you use a loop in this way then add in a time-out to prevent against infinite loop. Don't rely on code that works for one page working on another. You might get some mileage when it is the same site and occasionally with same frameworks. – QHarr Sep 04 '20 at 21:12
  • @Zwenn This is the code I am trying to make the code into a function. It works as a sub but not as a function. Can you help me please. – Achal Desai Sep 07 '20 at 11:03
  • I have added the function to my answer below as an edit. – Zwenn Sep 07 '20 at 12:29

1 Answers1

2

Some basics:
For simple accesses, like the present ones, you can use the get methods of the DOM (Document Object Model). But there is an important difference between getElementByID() and getElementsByClassName() / getElementsByTagName().

getElementByID() searches for the unique ID of a html tag. This is written as the ID attribute to html tags. If the html standard is kept by the page, there is only one element with this unique ID. That's the reason why the method begins with getElement.

If the ID is not found when using the method, VBA throws a runtime error. Therefore the call is encapsulated in the loop from the other answer from me, into switching off and on again the error handling. But in the page from this question there is no ID for the html area in question.

Instead, the required element can be accessed directly. You tried the access with getElementsByClassName(). That's right. But here comes the difference to getElementByID().

getElementsByClassName() and getElementsByTagName() begin with getElements. Thats plural because there can be as many elements with the same class or tag name as you want. This both methods create a html node collection. All html elements with the asked class or tag name will be listet in those collections.

All elements have an index, just like an array. The indexes start at 0. To access a particular element, the desired index must be specified. The two class names fs-12 body-font-bold (class names are seperated by spaces, you can also build a node collection by using only one class name) deliver 2 html elements to the node collection. You want the second one so you must use the index 1.

This is the VBA code for the asked page by using the IE:

Sub FlightStat_AF()

Dim url As String
Dim ie As Object

  'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
  url = "https://www.afklcargo.com/mycargo/shipment/detail/057-92366691"

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("InternetExplorer.Application")
  ie.Visible = False
  ie.navigate url
  Do Until ie.readyState = 4: DoEvents: Loop
  
  'Wait to load dynamic content after IE reports it's ready
  'We do that with a fix manual break of a few seconds
  'because the whole page will be "reload"
  'The last three values are hours, minutes, seconds
  Application.Wait (Now + TimeSerial(0, 0, 3))
  
  'Get the status from the table
  MsgBox Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
  
  'Clean up
  ie.Quit
  Set ie = Nothing
End Sub

Edit: Sub as function

This sub to test the function:

Sub testFunction()
  Dim flightStatAfResult As String
  flightStatAfResult = FlightStat_AF("057-92366691")
  MsgBox flightStatAfResult
End Sub

This is the sub as function:

Function FlightStat_AF(cargoNo As String) As String

Dim url As String
Dim ie As Object
Dim result As String

  'You can handle the parameters id and pfx in a loop to scrape dynamic numbers
  url = "https://www.afklcargo.com/mycargo/shipment/detail/" & cargoNo

  'Initialize Internet Explorer, set visibility,
  'call URL and wait until page is fully loaded
  Set ie = CreateObject("InternetExplorer.Application")
  ie.Visible = False
  ie.navigate url
  Do Until ie.readyState = 4: DoEvents: Loop
  
  'Wait to load dynamic content after IE reports it's ready
  'We do that with a fix manual break of a few seconds
  'because the whole page will be "reload"
  'The last three values are hours, minutes, seconds
  Application.Wait (Now + TimeSerial(0, 0, 3))
  
  'Get the status from the table
  result = Trim(ie.document.getElementsByClassName("fs-12 body-font-bold")(1).innerText)
  
  'Clean up
  ie.Quit
  Set ie = Nothing
  
  'Return value of the function
  FlightStat_AF = result
End Function
Zwenn
  • 2,147
  • 2
  • 8
  • 14
  • Thank You for this code! I am writing a similar code for another cargo tracking website. I have gotten as far as navigating to the page. But the problem is that the data table doesn't load at all in IE. It loads perfectly on chrome. Would you know the reason behind it? This is the site I am trying to scrape from - https://www.trackingmore.com/aircargo/72403698656 – Achal Desai Sep 07 '20 at 08:53
  • 1
    @AchalAesai The technology for loading the dynamic content of the page is too modern for IE. IE has not been developed further for some time and modern JavaScript can no longer be executed. I suppose you can only scrape the desired data if you use an interface like Selenium for it. With Selenium, browsers other than IE can be controlled from VBA. https://www.selenium.dev/ Another possibility is to use another language like Python, with appropriate libraries, like Beautifull Soap. – Zwenn Sep 07 '20 at 09:25
  • Thanks for clarifying this. I have no knowledge of python. But I will look into selenium as per your advice. – Achal Desai Sep 07 '20 at 10:26
  • I am trying create the above code as a function. When I run it as a sub, it gives me the correct result. But when I run it as a function, it gives me an #Value error. Where am I going wrong? Below is the bode I have written to convert the sub to a function. – Achal Desai Sep 07 '20 at 10:57
  • Thank you for the code. I tried it with the test sub and the msgbox gives the correct answer. but When I run it as a function `=FlightStat_AF(H3)` it again gives me a #value error. – Achal Desai Sep 07 '20 at 14:10
  • @AchalDesai The parameter must be a cargo number like `057-92366691`. I think `H3` isn't one. I had cut that number from the url in the old sub. The test sub gives that number to the function as a parameter. But the function doesn't seem to work as an udf. I don't know why because I'm no expert for udf. – Zwenn Sep 07 '20 at 17:10