1

i'm trying to web e from the following website.

All I would need is the headline content which i thought I could grab from DIV CLASS = "Content". The code is returning blank and i'm a bit stumped. I'm used to just grabbing details from a table so maybe I am missing something. enter image description here

enter image description here

Sub SmartCentreREIT()

Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim List As MSHTML.IHTMLElementCollection
Dim Section As MSHTML.IHTMLElement
Dim HTMLRow As MSHTML.IHTMLElement
Dim ws As Worksheet


XMLPage.Open "GET", "https://newsquawk.com/zerohedge/", False
XMLPage.send

HTMLDoc.body.innerHTML = XMLPage.responseText

Set List = HTMLDoc.getElementsByClassName("headlines_container")
Set ws = Worksheets("Sheet1")

RowNum = 2
ColNum = 1

For Each Section In List
        
    For Each HTMLRow In Section.getElementsByClassName("content")
        
        With ws
            .Cells(RowNum, ColNum) = HTMLRow.innerText
        ColNum = ColNum + 1
        End With
    
    Next HTMLRow
    
RowNum = RowNum + 1
ColNum = 1

Next Section


End Sub

1 Answers1

1

tl;dr

The page is dynamically loading content with server-sent events which are not supported by Internet Explorer. Automate Chrome with selenium basic VBA.


server-sent events:

Quoting from msdn

The EventSource interface is web content's interface to server-sent events. An EventSource instance opens a persistent connection to an HTTP server, which sends events in text/event-stream format. The connection remains open until closed by calling EventSource.close().

Server-sent events

Traditionally, a web page has to send a request to the server to receive new data; that is, the page requests data from the server. With server-sent events, it's possible for a server to send new data to a web page at any time, by pushing messages to the web page. These incoming messages can be treated as Events + data inside the web page.


Viewing the activity:

We can view this in the network tab.

enter image description here

You will see a blue timeline bar progressing as the EventStream updates.


Getting the data:

I don't know of a way to intercept this stream in VBA - doesn't mean it isn't possible (I haven't actively researched it). It would be far easier to just automate a browser and then parse out the required data.


Automating a browser to get the content:

In order to automate a browser, you will need to use a supported browser. For vba, this means automating with selenium basic, as Internet Explorer is not supported (hence your seeing no data), and likely using Chrome.

enter image description here


Related: Difference between EventSource and XMLHttpRequest for SSE


Homework:

Here is a sample of selenium basic with chrome examples:

https://stackoverflow.com/search?q=selenium+basic+vba+chrome


Additional considerations:

As page is frequently updating will there be issues obtaining your data? It may be best to, upon elements initially being present, transfer the html so an HTMLDocument variable to continue working with.

QHarr
  • 83,427
  • 12
  • 54
  • 101