I want to retrieve a table from the URL of https://s.cafef.vn/screener.aspx#data using VBA
. This task is difficult because the table contains JSON
data embedded in an html file.
Taking @Tomalak ‘s advice, I am trying to split up my task; solving four following individual problems one after another:
- Send an
HTTP
request to have theHTML
- Locate the
JSON
string - Parse
JSON
withVBA
and then - Loop over the raw data from the
JSON
and write into an Excel table.
Extract a JSON DATA table in html using VBA; converting Apps Script into VBA
However, I get stuck at Step 2
, the response text that I get is stored in htmlTEXT
. Its print-out looks like below attached, but the problem is as a string variable, htmlTEXT
can hold up only a small part of the html page content. The JSON
paragraph does not lie on the top part of the html page and is therefore not returned into htmlTEXT
.
My questions are:
How can we get the whole content of the html page (with the
JSON
paragraph included)?Once the
JSON
paragraph is captured, what Regular Expression can be used to extract theJSON
paragraph ?
Noticing that the JSON
paragraph starts with [{
and ends with }]
, I therefore use the pattern [{*}]
but it does not work at all, (though it works with pattern like (D.C)
; resulting in DOC
for my testing purpose)
What is wrong with my code?
Sub ExtractJSON_in_html()
' =====send an HTTP request with VBA ====
Dim JSONtext As String
Dim htmlTEXT As String
Dim SDI As Object
Set objHTTP = CreateObject("MSXML2.XMLHTTP")
Url = "https://s.cafef.vn/screener.aspx#data"
objHTTP.Open "GET", Url, False
objHTTP.send
htmlTEXT = objHTTP.responsetext
MsgBox htmlTEXT
' ===== Locate the JSON string =======
Set SDI = CreateObject("VBScript.RegExp")
SDI.IgnoreCase = True
SDI.Pattern = "[{*}]"
SDI.Global = True
Set theMatches = SDI.Execute(htmlTEXT)
For Each Match In theMatches
'MsgBox Match.Value
JSONtext = Match.Value
Next
End Sub
htmlTEXT:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
-- JASON Paragraph var jsonData = [{"Url":"http://s.cafef.vn/upcom/A32-cong-ty-co-phan-32.chn","CenterName":"UpCom","Symbol":"A32","TradeCenterID":9,"ChangePrice":0,"VonHoa":212.84,"ChangeVolume":400,"EPS":6.19220987764706,"PE":5.0547382305287,"Beta":0,"Price":0,"UpdatedDate":"\/Date(1625562652463)\/","FullName":"Công ty cổ phần 32","ParentCategoryId":0
{"Url":"http://s.cafef.vn/upcom/YTC-cong-ty-co-phan-xuat-nhap-khau-y-te-thanh-pho-ho-chi-minh.chn","CenterName":"UpCom","Symbol":"YTC","TradeCenterID":9,"ChangePrice":0,"VonHoa":170.8,"ChangeVolume":200,"EPS":-4.29038514857143,"PE":-14.217837766922,"Beta":0,"Price":0,"UpdatedDate":"\/Date(1625562969277)\/","FullName":"Công ty Cổ phần Xuất nhập khẩu Y tế Thành phố Hồ Chí Minh","ParentCategoryId":0}];