tl;dr;
I was attempting to answer this SO question. It has no given valid answers and is 7 months old now.
My question is about why my solution to the same problem did not work. If this is considered a duplicate I will delete this, and potentially offer a bounty on the other question - but can I then specify I want the answer to show me how to correctly use the tool I mention (fiddler
); and using a POST
request? *Assuming that my method can indeed be successfully deployed.
Scenario:
The scenario is the same i.e. to scrape this page:
HKEXnews and obtain the historic data for the Shareholding Date 22/08/2017
; Write this data, contained in the table Detail of Shareholding, identified by its HTML Id of pnlResult
, or by table
tag index with ("table")(2)
, to Excel.
As with the originating OP my method returns data from the latest date not the requested date.
How do I formulate the POST request correctly to get data for the specified date? i.e. Did I miss something in the fiddler provided information or did I misunderstand the provided information?
My attempted solution:
I decided to go to that website and select the date in question and inspect the web traffic using fiddler.
What I noticed is that after making the date selection I could see a POST
request was made.
① So I examined the headers:
② And I examined the web forms details:
③ And I used these to formulate a POST request to the host.
④ The main parameters for the day of interest were defined as SHARE_DAY
,SHARE_MONTH
and SHARE_YEAR
.
⑤ TODO - I did refactor the writing of the HTMLTable
to the sheet into its own sub; initially having GetShareholdingInfo
as a function returning Object
(late binding as opposed to HTMLTable
). I kept getting Error 70 Permission Denied
when passing the HTMLTable object around so in the end I have had to settle with the less than optimal solution below.
Note: I have looked at the sidebar suggestions given by the question wizard and a number of other questions. For example:
- How can I send an HTTP POST request to a server from Excel using VBA?
- Adding Parameters to VBA HTTP Post Request
- Fetching data from a website using “POST” request
VBA:
Option Explicit
Public Sub ShareHoldingInfo()
Dim headers(), ws As Worksheet
Const SHARE_DAY As Long = 22
Const SHARE_MONTH As Long = 8
Const SHARE_YEAR As Long = 2017
headers = Array("Stock Code", "Stock Name", "Shareholding in CCASS", "% of the total number of A shares listed and traded on the SSE")
Application.ScreenUpdating = False
Set ws = ThisWorkbook.Worksheets("Sheet1")
With ws
.Cells.ClearContents
.Cells(1, 1) = "Shareholding Date: " & Format$(DateSerial(SHARE_YEAR, SHARE_MONTH, SHARE_DAY), "dd/mm/yyyy")
.Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
End With
GetShareholdingInfo SHARE_DAY, SHARE_MONTH, SHARE_YEAR, ws
Application.ScreenUpdating = True
End Sub
Public Sub GetShareholdingInfo(ByVal ShareholdingDay As Long, ByVal ShareholdingMonth As Long, ByVal ShareholdingYear As Long, ByRef ws As Worksheet)
Dim objHTTP As Object, URL As String, html As New HTMLDocument
Const POST_PREFIX As String = "_VIEWSTATE=||1||& __VIEWSTATEGENERATOR=||2||& __EVENTVALIDATION=||3||"
Dim vs As String, vsg As String, ev As String, sToday As String, sortBy As String, alertMsg As String, btnX As String, btnY As String
Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
URL = "http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk%2f%3f"
vs = "fQ3xjD+zA7WVka368FepJS4Hd3mTPU6XV2UPah+ERqFw2JopZen8Fh01/HNQ4fGDqq3Kvtf41T7rhMZswpllY27gJmmhH0FuboK69Mg7JbV2BrXq1R2YBHvSdgObosBV9jd1A5znZcEojA932bsVacdM0vXv87P8u6+G3yFMfYFXSFXESwmNvhyToVTFGvVXT2ZSfw=="
vsg = "3C67932C"
ev = "JHSF2ZFqsNlguCM2lnvh6Lm/Uw90fVvqToCx9tvqNX6rnXHd/f/kjekxc1L1IFsrCxIiOAgVSm0NkQOyVJ9xpg1KepfuKu+14YtRmjrRUtiHOcOptGnDqLGa1aqboGSjFoW/mUjh//B/yA9nPjzZYKHjNlMhHphc4Zb97hOuoVRuchFSnduT5eDI91mMmh7ad8oc+mB1HDyGdREDArhhSX6ZFGRB+IWwbJpT5YN5XQOKpSZNoS0sYjWTALysDOE/"
ev = ev & "8fYSW1ggiXIQ0ZHSRqvq9xVJgshfk79svlBsNsHsrsIXf3XV12o8JdUiabpfy64I8bDkJo0IYgWKOP+03BGaRm93TwaXg+XmHJ9zDVg58JYIKmwI/tibYGK2hN+CS3E7CEineQptTXy+eFXjJVekVlhy8utVytRZD5BJQOWwVd+j7HYgra9JI7wL/8+mMCEJEzMjEDeHgnrbSpE48gvO7r7dWka4yDlRFtFZqnrYCmubIjP4ZlwUCAjY3Mx7gD22k2bXEg+uyTLnIJ3/"
ev = ev & "5bkzEKgaKDMo7sbTeO1nku969aHNlTge9cIzaVuq7m5Tm8z4p0GhxOX6FRsy9ItCkg0k6zzNOFNKg7Rprd7xLs00AOxwxJd6Z17cLskUxOEWecivAh923t6thB9UPKtcGt2KxVqlAgqro3Ij7OGO+QGOM4ZGc5lnpI6TVyEipuM03OwpqmgKbKd2NpxfgMriujZYqLNwzjgAIxOPhzdfUDF6+c2kttHZ+zHFt+PDucS9cgJ7ijqszmYHWUeZ6qg3n8HbXEf+WszThZp70plUWOtWH8UvBVQ3y9CcI9aK4lmz/"
ev = ev & "6W7yVx5sQUPBktX7GppJMViZRgOZnYU/x6Bpx9qEv/Pp0dQ0tq2jesl7UESS4YG2QMzmIbJBkRucW1+gxMXppG+Q1oL2kE2cEzDvYtF7sr6wdLzGtLpUcuDK4Jw9dZBimmI+o8QUMWOml1ccuwe686+ea8QJbi1RPKgpN3creNXhDW5w74xrZvXP/avY+XpGPg"
ev = ev & "opjmUZ9qnZoVChvEDupCP5IfbKkeL2n+wYyupt45orDwbIxyg0GunTenMrI9CwoRN8w6X7SdFMG2IwojrOLnuuJt6ml/Tva5zHqflPfRPAAhn/jS5t+sf0jzBy2ZPMd5rlm7oDCRnD5X+4AnmjA0NhQrhW4s="
sToday = Format$(Now, "yyyymmdd")
sortBy = vbNullString
alertMsg = vbNullString
btnX = "29"
btnY = "15"
Dim sBody As String
sBody = POST_PREFIX
sBody = Replace$(Replace$(Replace$(POST_PREFIX, "||1||", vs), "||2||", vsg), "||3||", ev)
sBody = sBody & "&today=" & sToday
sBody = sBody & "&sortBy=" & sortBy
sBody = sBody & "&alertMsg=" & alertMsg
sBody = sBody & "&ddlShareholdingDay=" & ShareholdingDay
sBody = sBody & "&ddlShareholdingMonth=" & ShareholdingMonth
sBody = sBody & "&ddlShareholdingYear=" & ShareholdingYear
sBody = sBody & "&btnSearch.x=" & btnX
sBody = sBody & "&btnSearch.y=" & btnY
With objHTTP
.SetTimeouts 10000, 10000, 10000, 10000
.Open "POST", URL, False
.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
On Error Resume Next
.send (sBody)
If Err.Number = 0 Then
If .Status = "200" Then
html.body.innerHTML = .responseText
Else
Debug.Print "HTTP " & .Status & " " & .statusText
Exit Sub
End If
Else
Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
Exit Sub
End If
On Error GoTo 0
End With
Dim hTable As Object
'Set hTable = html.getElementsByTagName("table")(2)
Set hTable = html.getElementById("pnlResult")
Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
r = 3
With ws
Set tRow = hTable.getElementsByTagName("tr") 'HTMLTableRow
For Each tr In tRow
Set tCell = tr.getElementsByTagName("td")
c = 1
For Each td In tCell 'DispHTMLElementCollection
.Cells(r, c).Value = td.innerText 'HTMLTableCell
c = c + 1
Next td
r = r + 1
Next tr
End With
End Sub
Sample from website for that day:
I expected results for the specified date:
Sample write out from code:
I got results for the latest available period i.e. the normal landing page results.
EDIT:
Thanks to comment from @Jeeped. There is the following information on the landing page (not all of which I showed above).
<form name="form1" method="post" action="./mutualmarket.aspx?t=hk%2f%3f" id="form1" style="text-align: left;">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="bKL6Py6JoD+UqJiQ0rYby89kuE5F9cbnJNFYPp2spoXRpiUHWXMPRG8zwk8PILbZSbn81reaeg6k3H5YU/r2NPcdO0WwyXcyC7YqTiDi3xpXEPWm654UtThRL5HgsHmMZGRMMxBPiHUTA+GtplZ6t/s8chsO/dfnCHXzOQWQ6CazRo80IBYnDTrKY/6q9hx/YDRu+g==">
</div>
<div>
<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="3C67932C">
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="HI5FxZFHKh1Q5D5ExYCAkTbgozWeAb/DOPLTwntzLbpFZaOqvYTa/jniEDrvg53syrn/3EpGt7IYjns+KXq6FaUppQRtjT2EosDG5wmuGEaf+WuKFRqUtRpcS2MGWJMEqbwTfiRyy7CLqcD1TqRBn/oey+UZyHCE/P2GpkQiYpVIZEABJdeW3ehj57EP+yDZcuWgAUtnjHjnkNk54+jTlD5r7S+lT8FGYm1uoytV7detJ5KL9XUI/By3iKETPSunsOzb8+zBMOYheIk1/7QiDDoAHOUcmelTLz6YmCM4j+xZD00nvVjMnLCERoUV/A32fidSnzaZEBk+Mgi8nVTZ+71u95p/D/ELwN/xeUen3cEy/cgYlyHi3MjLof3SSxOOWQUflufPA+kAnTT0XgoOguN/UO3QBfcvWAqgpWTPSujx6FLdSu7r9bk7DyBDD+uwqB5jKQkk+zqEoR1ODZ+Zp8VNqezsmPNczSBihHri207Hd++oHVKGlpbRG2KUBK0ngNjSQ2Gqqeuq4lOOCEl41JzBlHYZBIUSHp5tI2L61aR2gGFjVkPxyE8doM2EZZgdIUl9lDiWmTvtNU7nS12QC9v4e4whz1NSqC1LUOA/JVWiJ0KYdXG4AGyArcosOGZQF6eaFzkWw9ooak2LXPYbUb1pt+S+NfXk3vDpalfC+bthIukOas/QC8+RXQF5oiSZyNo3Znj+/jj1IvDwl7kJHR76k2XFeIb5Z+mTzBJMphze5S/VzkHN3E1vaFKQErQj5Y62wYssyfUKamCj21FhH6Era6IEiBxnUIinCIY/6387sBNlrdUabBfxK0Vm3gEjrGU/i5hSTmcn+t5/XBC6T8Xx5cVGIZUKDMtmFqkFF4RZtRpsuKfT1VyW0gDK26YrYW14bx19Tgm+8u2HmvvM7ZzbC1pabCJnlN6RhYswtm8aT7aq9TqcmV9wPfewHr5LOZaAsCjE+qTF+vrU1v2tmI4p7IevlhnomZO4BRD8bAogJF6LCWNUmxdTu5ewP4MJrOiP7cMDmuLHoiExdWL1PQOFh/E16JXw6liJuhk38+dG5eKzwt3bdeX3iT44XzHReE6pUK4G4nQIipa5LQFvXpsrCQ4lsn+BiXUdHtjhSG6S3kyYipOQrLEcSjYsHxZdUZDz4KmrhcEEcTVKXT4xxCFiQ74=">
</div>
<input type="hidden" name="today" id="today" value="20180623">
<input type="hidden" name="sortBy" id="sortBy">
<input type="hidden" name="alertMsg" id="alertMsg">
This part mirrors what I observed in fiddler.