0

I would like to web scraping all the job title and company name from a job search website. However I unable to do so as I believe I cant inspect the correct element in the HTML codes. I researched this for days, please assist and advise on the correct HTML element. Once I able to inspect the correct element and I will do the looping and finish this program. Appreciate.

Website: https://www.efinancialcareers.my/search/?countryCode=MY&radius=40&radiusUnit=km&page=1&pageSize=20&currencyCode=MYR&language=en0

Option Explicit

Sub xmlhttp_scraping()

Dim XMLrequest As New MSXML2.XMLHTTP60

XMLrequest.Open "GET", "https://www.efinancialcareers.my/search/?countryCode=MY&radius=40&radiusUnit=km&page=1&pageSize=20&currencyCode=MYR&language=en0", False
XMLrequest.send

Dim iDOC As New MSHTML.HTMLDocument
iDOC.body.innerHTML = XMLrequest.responseText

'Cells(2, 2).Value = iDOC.getElementsByClassName("d-flex justify-content-between")(0).getElementsByTagName("h5")(0).getElementsByTagName("a")(0).innerText
'Cells(2, 2).Value = iDOC.getElementById("8091724").innerText
'Cells(2, 2).Value = iDOC.getElementsByClassName("search-card")(0).getElementsByClassName("d-flex justify-content-between")(0).getElementsByTagName("h5")(0).getElementsByTagName("a")(0).innerText


Range("H1").Value = "Time Updated on"
Range("I1").Value = Now

Columns.AutoFit
MsgBox "Done"

End Sub

Sample of HTML code below:

enter image description here

  • The issue might be in the end of the html shown '== $0', not sure what its means and how to inspect the element with this. – Kwan Wan Sing Jul 20 '20 at 02:27

1 Answers1

0

The page you try to get creates the contents using JavaScript. However, in your code, innerHTML of iDOC is only static content.

For the page to property run JavaScript, you can automate IE using InternetExplorer.Application. Try googling keywords like "Automate Internet Explorer Using VBA."

EDIT

I read your comment. The page gets the READY state too quickly. So, you should wait for the contents to be generated in some way (e.g. sleep or check some element appeared).

Public Declare Sub Sleep Lib "kernel32" (ByVal ms As Long)

Sub sc2()
    Dim objIE As New InternetExplorer
    objIE.Visible = True
    objIE.navigate "https://www.efinancialcareers.my/search/?countryCode=MY&radius=40&radiusUnit=km&page=1&pageSize=20&currencyCode=MYR&language=en0"
    
    Do While objIE.Busy = True Or objIE.readyState < READYSTATE_COMPLETE
        DoEvents
    Loop
    
    Dim htmlDoc As HTMLDocument
    
    ' Wait long enough
    Sleep 10000

    ' ... Or wait until some element appears (some element disappears)
'    Do
'        Set htmlDoc = objIE.document
'
'        If htmlDoc.getElementsXXX Then
'            Exit Do
'        End If
'
'        DoEvents
'        Sleep 1000
'    Loop
    
    Set htmlDoc = objIE.document
    
    ' Then you can access elements
    ' ... but this code also has a problem. ``.getElementsByTagName("h5")`` returns nothing. Please inspect the html.
    Debug.Print htmlDoc.getElementsByClassName("d-flex justify-content-between")(0).getElementsByTagName("h5")(0).getElementsByTagName("a")(0).innerText
    
End Sub

Moreover, the code that accesses the elements also has a problem. Since it doesn't follow generated html, .getElementsByTagName("h5") returns nothing. Please inspect the html in Chrome's console or VBE's Watch window.

== $0 is not related to your problem. It simply means the active DOM element in the developer tool. (What does == $0 mean in the DOM view in developer tools?)

By the way, more and more sites are dropping support for IE. Using InternetExplorer object is convenient, but automating Chrome or Firefox with Selenium is a better approach.

kamocyc
  • 139
  • 5
  • Hi Kamocyc, I tried both methods of IE automation and XMLHTTPrequest, both having the same debug message and stop at the HTML code with == $0. – Kwan Wan Sing Jul 20 '20 at 03:33