2

tl;dr;

I was attempting to answer this SO question. It has no given valid answers and is 7 months old now.

My question is about why my solution to the same problem did not work. If this is considered a duplicate I will delete this, and potentially offer a bounty on the other question - but can I then specify I want the answer to show me how to correctly use the tool I mention (fiddler); and using a POST request? *Assuming that my method can indeed be successfully deployed.


Scenario:

The scenario is the same i.e. to scrape this page: HKEXnews and obtain the historic data for the Shareholding Date 22/08/2017; Write this data, contained in the table Detail of Shareholding, identified by its HTML Id of pnlResult, or by table tag index with ("table")(2), to Excel.

As with the originating OP my method returns data from the latest date not the requested date.

How do I formulate the POST request correctly to get data for the specified date? i.e. Did I miss something in the fiddler provided information or did I misunderstand the provided information?


My attempted solution:

I decided to go to that website and select the date in question and inspect the web traffic using fiddler.

What I noticed is that after making the date selection I could see a POST request was made.

① So I examined the headers:

Fiddler headers

② And I examined the web forms details:

web form details

③ And I used these to formulate a POST request to the host.

④ The main parameters for the day of interest were defined as SHARE_DAY ,SHARE_MONTH and SHARE_YEAR.

⑤ TODO - I did refactor the writing of the HTMLTable to the sheet into its own sub; initially having GetShareholdingInfo as a function returning Object (late binding as opposed to HTMLTable). I kept getting Error 70 Permission Denied when passing the HTMLTable object around so in the end I have had to settle with the less than optimal solution below.

Note: I have looked at the sidebar suggestions given by the question wizard and a number of other questions. For example:

  1. How can I send an HTTP POST request to a server from Excel using VBA?
  2. Adding Parameters to VBA HTTP Post Request
  3. Fetching data from a website using “POST” request

VBA:

Option Explicit

Public Sub ShareHoldingInfo()
    Dim headers(), ws As Worksheet
    Const SHARE_DAY As Long = 22
    Const SHARE_MONTH As Long = 8
    Const SHARE_YEAR As Long = 2017

    headers = Array("Stock Code", "Stock Name", "Shareholding in CCASS", "% of the total number of A shares listed and traded on the SSE")
    Application.ScreenUpdating = False

    Set ws = ThisWorkbook.Worksheets("Sheet1")

    With ws
        .Cells.ClearContents
        .Cells(1, 1) = "Shareholding Date: " & Format$(DateSerial(SHARE_YEAR, SHARE_MONTH, SHARE_DAY), "dd/mm/yyyy")
        .Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
    End With

    GetShareholdingInfo SHARE_DAY, SHARE_MONTH, SHARE_YEAR, ws

    Application.ScreenUpdating = True
End Sub

Public Sub GetShareholdingInfo(ByVal ShareholdingDay As Long, ByVal ShareholdingMonth As Long, ByVal ShareholdingYear As Long, ByRef ws As Worksheet)
    Dim objHTTP As Object, URL As String, html As New HTMLDocument
    Const POST_PREFIX As String = "_VIEWSTATE=||1||& __VIEWSTATEGENERATOR=||2||& __EVENTVALIDATION=||3||"
    Dim vs As String, vsg As String, ev As String, sToday As String, sortBy As String, alertMsg As String, btnX As String, btnY As String

    Set objHTTP = CreateObject("WinHttp.WinHttpRequest.5.1")
    URL = "http://www.hkexnews.hk/sdw/search/mutualmarket.aspx?t=hk%2f%3f"
    vs = "fQ3xjD+zA7WVka368FepJS4Hd3mTPU6XV2UPah+ERqFw2JopZen8Fh01/HNQ4fGDqq3Kvtf41T7rhMZswpllY27gJmmhH0FuboK69Mg7JbV2BrXq1R2YBHvSdgObosBV9jd1A5znZcEojA932bsVacdM0vXv87P8u6+G3yFMfYFXSFXESwmNvhyToVTFGvVXT2ZSfw=="
    vsg = "3C67932C"
    ev = "JHSF2ZFqsNlguCM2lnvh6Lm/Uw90fVvqToCx9tvqNX6rnXHd/f/kjekxc1L1IFsrCxIiOAgVSm0NkQOyVJ9xpg1KepfuKu+14YtRmjrRUtiHOcOptGnDqLGa1aqboGSjFoW/mUjh//B/yA9nPjzZYKHjNlMhHphc4Zb97hOuoVRuchFSnduT5eDI91mMmh7ad8oc+mB1HDyGdREDArhhSX6ZFGRB+IWwbJpT5YN5XQOKpSZNoS0sYjWTALysDOE/"
    ev = ev & "8fYSW1ggiXIQ0ZHSRqvq9xVJgshfk79svlBsNsHsrsIXf3XV12o8JdUiabpfy64I8bDkJo0IYgWKOP+03BGaRm93TwaXg+XmHJ9zDVg58JYIKmwI/tibYGK2hN+CS3E7CEineQptTXy+eFXjJVekVlhy8utVytRZD5BJQOWwVd+j7HYgra9JI7wL/8+mMCEJEzMjEDeHgnrbSpE48gvO7r7dWka4yDlRFtFZqnrYCmubIjP4ZlwUCAjY3Mx7gD22k2bXEg+uyTLnIJ3/"
    ev = ev & "5bkzEKgaKDMo7sbTeO1nku969aHNlTge9cIzaVuq7m5Tm8z4p0GhxOX6FRsy9ItCkg0k6zzNOFNKg7Rprd7xLs00AOxwxJd6Z17cLskUxOEWecivAh923t6thB9UPKtcGt2KxVqlAgqro3Ij7OGO+QGOM4ZGc5lnpI6TVyEipuM03OwpqmgKbKd2NpxfgMriujZYqLNwzjgAIxOPhzdfUDF6+c2kttHZ+zHFt+PDucS9cgJ7ijqszmYHWUeZ6qg3n8HbXEf+WszThZp70plUWOtWH8UvBVQ3y9CcI9aK4lmz/"
    ev = ev & "6W7yVx5sQUPBktX7GppJMViZRgOZnYU/x6Bpx9qEv/Pp0dQ0tq2jesl7UESS4YG2QMzmIbJBkRucW1+gxMXppG+Q1oL2kE2cEzDvYtF7sr6wdLzGtLpUcuDK4Jw9dZBimmI+o8QUMWOml1ccuwe686+ea8QJbi1RPKgpN3creNXhDW5w74xrZvXP/avY+XpGPg"
    ev = ev & "opjmUZ9qnZoVChvEDupCP5IfbKkeL2n+wYyupt45orDwbIxyg0GunTenMrI9CwoRN8w6X7SdFMG2IwojrOLnuuJt6ml/Tva5zHqflPfRPAAhn/jS5t+sf0jzBy2ZPMd5rlm7oDCRnD5X+4AnmjA0NhQrhW4s="
    sToday = Format$(Now, "yyyymmdd")
    sortBy = vbNullString
    alertMsg = vbNullString
    btnX = "29"
    btnY = "15"

    Dim sBody As String
    sBody = POST_PREFIX
    sBody = Replace$(Replace$(Replace$(POST_PREFIX, "||1||", vs), "||2||", vsg), "||3||", ev)
    sBody = sBody & "&today=" & sToday
    sBody = sBody & "&sortBy=" & sortBy
    sBody = sBody & "&alertMsg=" & alertMsg
    sBody = sBody & "&ddlShareholdingDay=" & ShareholdingDay
    sBody = sBody & "&ddlShareholdingMonth=" & ShareholdingMonth
    sBody = sBody & "&ddlShareholdingYear=" & ShareholdingYear
    sBody = sBody & "&btnSearch.x=" & btnX
    sBody = sBody & "&btnSearch.y=" & btnY

    With objHTTP
        .SetTimeouts 10000, 10000, 10000, 10000
        .Open "POST", URL, False
        .setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
        .setRequestHeader "Content-type", "application/x-www-form-urlencoded"
        On Error Resume Next
        .send (sBody)
        If Err.Number = 0 Then
            If .Status = "200" Then
                html.body.innerHTML = .responseText
            Else
                Debug.Print "HTTP " & .Status & " " & .statusText
                Exit Sub
            End If
        Else
            Debug.Print "Error " & Err.Number & " " & Err.Source & " " & Err.Description
            Exit Sub
        End If
        On Error GoTo 0
    End With

    Dim hTable As Object
    'Set hTable = html.getElementsByTagName("table")(2)
    Set hTable = html.getElementById("pnlResult")
    Dim tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long
    r = 3
    With ws
        Set tRow = hTable.getElementsByTagName("tr") 'HTMLTableRow
        For Each tr In tRow
            Set tCell = tr.getElementsByTagName("td")
            c = 1
            For Each td In tCell                 'DispHTMLElementCollection
                .Cells(r, c).Value = td.innerText 'HTMLTableCell
                c = c + 1
            Next td
            r = r + 1
        Next tr
    End With
End Sub

Sample from website for that day:

I expected results for the specified date:

Web results


Sample write out from code:

I got results for the latest available period i.e. the normal landing page results.

Sample code output


EDIT:

Thanks to comment from @Jeeped. There is the following information on the landing page (not all of which I showed above).

<form name="form1" method="post" action="./mutualmarket.aspx?t=hk%2f%3f" id="form1" style="text-align: left;">
 <div>
  <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="bKL6Py6JoD+UqJiQ0rYby89kuE5F9cbnJNFYPp2spoXRpiUHWXMPRG8zwk8PILbZSbn81reaeg6k3H5YU/r2NPcdO0WwyXcyC7YqTiDi3xpXEPWm654UtThRL5HgsHmMZGRMMxBPiHUTA+GtplZ6t/s8chsO/dfnCHXzOQWQ6CazRo80IBYnDTrKY/6q9hx/YDRu+g==">
  </div>
  <div>
   <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="3C67932C">
    <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="HI5FxZFHKh1Q5D5ExYCAkTbgozWeAb/DOPLTwntzLbpFZaOqvYTa/jniEDrvg53syrn/3EpGt7IYjns+KXq6FaUppQRtjT2EosDG5wmuGEaf+WuKFRqUtRpcS2MGWJMEqbwTfiRyy7CLqcD1TqRBn/oey+UZyHCE/P2GpkQiYpVIZEABJdeW3ehj57EP+yDZcuWgAUtnjHjnkNk54+jTlD5r7S+lT8FGYm1uoytV7detJ5KL9XUI/By3iKETPSunsOzb8+zBMOYheIk1/7QiDDoAHOUcmelTLz6YmCM4j+xZD00nvVjMnLCERoUV/A32fidSnzaZEBk+Mgi8nVTZ+71u95p/D/ELwN/xeUen3cEy/cgYlyHi3MjLof3SSxOOWQUflufPA+kAnTT0XgoOguN/UO3QBfcvWAqgpWTPSujx6FLdSu7r9bk7DyBDD+uwqB5jKQkk+zqEoR1ODZ+Zp8VNqezsmPNczSBihHri207Hd++oHVKGlpbRG2KUBK0ngNjSQ2Gqqeuq4lOOCEl41JzBlHYZBIUSHp5tI2L61aR2gGFjVkPxyE8doM2EZZgdIUl9lDiWmTvtNU7nS12QC9v4e4whz1NSqC1LUOA/JVWiJ0KYdXG4AGyArcosOGZQF6eaFzkWw9ooak2LXPYbUb1pt+S+NfXk3vDpalfC+bthIukOas/QC8+RXQF5oiSZyNo3Znj+/jj1IvDwl7kJHR76k2XFeIb5Z+mTzBJMphze5S/VzkHN3E1vaFKQErQj5Y62wYssyfUKamCj21FhH6Era6IEiBxnUIinCIY/6387sBNlrdUabBfxK0Vm3gEjrGU/i5hSTmcn+t5/XBC6T8Xx5cVGIZUKDMtmFqkFF4RZtRpsuKfT1VyW0gDK26YrYW14bx19Tgm+8u2HmvvM7ZzbC1pabCJnlN6RhYswtm8aT7aq9TqcmV9wPfewHr5LOZaAsCjE+qTF+vrU1v2tmI4p7IevlhnomZO4BRD8bAogJF6LCWNUmxdTu5ewP4MJrOiP7cMDmuLHoiExdWL1PQOFh/E16JXw6liJuhk38+dG5eKzwt3bdeX3iT44XzHReE6pUK4G4nQIipa5LQFvXpsrCQ4lsn+BiXUdHtjhSG6S3kyYipOQrLEcSjYsHxZdUZDz4KmrhcEEcTVKXT4xxCFiQ74=">
    </div>
    <input type="hidden" name="today" id="today" value="20180623">
     <input type="hidden" name="sortBy" id="sortBy">
      <input type="hidden" name="alertMsg" id="alertMsg">

This part mirrors what I observed in fiddler.

Community
  • 1
  • 1
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • The fillde data looks truncated. Did you miss some form of guid id that was thrown at you and expectef to be returned in header info for future post requests? –  Jun 23 '18 at 07:17
  • When you hit the 'landing page' of a site, the page often throws a jsonid or some other guid cookie at you during form submission. This is to stop you from doing exactly what you are trying to do; post without the landing page. Look at the search form for elements. The truncated part was the fiddle image you posted; I could see there was more that wasn't visible. –  Jun 23 '18 at 07:22
  • It looks like it isn't an actual html
    but a jsscript push to preprocessMutualMarketMainForm. I couldn't find preprocessMutualMarketMainForm in the html source so it must be in one of the includes. Posssibly one of
    –  Jun 23 '18 at 07:33
  • 1
    btw, folks who take the time to safeguard their data against scraping as well as this generally also include ip bans when too many requests have been made. You may have to look into proxing your ip. –  Jun 23 '18 at 07:39
  • no biggie. just copy the relative .js address into the browser url window and hot enter then search. My next comment will be http://www.hkexnews.hk/script/hkex_scripts.js source. –  Jun 23 '18 at 07:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173665/discussion-between-qharr-and-jeeped). – QHarr Jun 23 '18 at 07:43
  • /*! * jQuery JavaScript Library v1.5 * http://jquery.com/ * * Copyright 2011, John Resig * Dual licensed under the MIT or GPL Version 2 licenses. * http://jquery.org/license * * Includes Sizzle.js * http://sizzlejs.com/ * Copyright 2011, The Dojo Foundation * Released under the MIT, BSD, and GPL Licenses. * * Date: Mon Jan 31 08:31:29 2011 -0500 */ (function(a,b){function b$(a){return d.isWindow(a)?a:a.nodeType===9?a.defaultView||a.parentWindow:!1}function bX(a){if(!bR[a]){var b=d("<"+a+">").appendTo("body"),c=b.css("display");b.remove();if(c==="none"||c==="")c=... –  Jun 23 '18 at 07:43
  • look at the include scripts in th html source. one was /script/hkex_scripts.js –  Jun 23 '18 at 07:45
  • @Jeeped: Thank you. Some homework for me. – QHarr Jun 23 '18 at 07:47
  • Hope you caught the [chat comments](https://chat.stackoverflow.com/rooms/173665/discussion-between-qharr-and-jeeped) as well. –  Jun 23 '18 at 07:53
  • Getting there! I am running wildly to catch up ;-) – QHarr Jun 23 '18 at 07:56
  • @omegastripes Can you see anything obvious I have missed please? – QHarr Jun 27 '18 at 13:07

0 Answers0