2

Situation:

I am downloading files from the webpage NHS Delayed Transfers of Care.

In the HTML I can see the following:

onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');"

After looking here and seeing these SO questions (amongst others):

I am under the impression that ga() is a JavaScript function I should be able to call directly with .execScript.

Question:

Can I execute the JavaScript function using .execScript to download the file? If not, how can I download the file(s)?

What I have tried:

I have tried the following unsuccessfully:

1) Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

'-2147352319 Automation error


2) Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

Error 438 Object doesn't support this property or method


3) Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

Error 91 Object variable or With block variable not set


4) Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript")

-2147352319 Could not complete the operation due to error 80020101.

I will admit to very little knowledge of these sorts of operations. Can anyone see where I am going wrong please?

Code:

Option Explicit

Public Sub DownloadDTOC()

    Dim http As New XMLHTTP60
    Dim html As New HTMLDocument
    Dim CurrentWindow As HTMLWindowProxy

    With http
        .Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/delayed-transfers-of-care-data-2017-18/", False
        .send
        html.body.innerHTML = .responseText
    End With

    On Error GoTo Errhand

    'Call html.parentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '-2147352319   Automation error

    'Call html.frames(0).execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '438 Object doesn't support this property or method
'automation error

    'Call currentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") ' 91 Object variable or With block variable not set

    Set CurrentWindow = html.parentWindow
    Call CurrentWindow.execScript("ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');", "Javascript") '--2147352319  Could not complete the operation due to error 80020101.

    Exit Sub

Errhand:
    If Err.Number <> 0 Then Debug.Print Err.Number, Err.Description
End Sub

References added:

References in project

Here is a reduced version of the HTML. Apologies, I am not used to formatting HTML.

<p>
  <a href="https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls" class="xls-link" onclick="ga('send', 'event', 'Downloads', 'XLS', 'https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2018/01/LA-Type-B-November-2017-2ayZP.xls');">Total Delayed Days Local Authority 2017-18 November (XLS, 121KB)</a>
  <br>
</p>
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • I know that IE handles `.execScript` well. Have you tried opening it through a hidden IE window and then executing your script? – dadler Jan 17 '18 at 11:35
  • Have you tried to get the text within class `xls-link`? `onclick` is also available within that class. However, what I wanna say is `xmlhttp60` request won't be able to fetch you anything from that page because it can't even parse the text within that class. The content of that website is generated dynamically. You should go for IE. – SIM Jan 17 '18 at 11:53
  • I will try with IE. I was deliberately avoiding because it is slow. – QHarr Jan 17 '18 at 12:25
  • @Shahin As an aside, when I tried getting by className using "xls-link" nothing was returned. Is this to do with .OuterHTML versus .Inner ? – QHarr Jan 17 '18 at 12:47
  • That ga() is just a call to google analytics, it wont affect the download, do you really need to invoke it? – Alex K. Jan 17 '18 at 12:48
  • @Alex K I don't have to. In effect then, if i executed the function would I simply be sending (tracking?) data then? – QHarr Jan 17 '18 at 13:01
  • Yes that's what GA does. – Alex K. Jan 17 '18 at 13:02
  • I will edit this question so it will be of use to others. – QHarr Jan 17 '18 at 13:14
  • My only intention was to get your problem resolved and I'm happy you are there. Thanks. – SIM Jan 18 '18 at 06:56

1 Answers1

0

So I ended up using a CSS selector to get all the hrefs for the downloads and passing them to URLMon for download. As there is a two month lag for latest files I filtered files to download on those 2 months behind month.


CSS selector:

The selector I chose was #main-content a[href*=xls]

This looked for elements with elements with a tags, having attribute href containing the string "xls", inside of element with id of main=content.


Sample CSS query results:

query results


VBA:

Option Explicit
Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" ( _
ByVal pCaller As LongPtr, _
ByVal szURL As String, _
ByVal szFileName As String, _
ByVal dwReserved As LongPtr, _
ByVal lpfnCB As LongPtr _
) As Long
Private Declare PtrSafe Function DeleteUrlCacheEntry Lib "Wininet.dll" _
Alias "DeleteUrlCacheEntryA" ( _
ByVal lpszUrlName As String _
) As Long

Public Const BINDF_GETNEWESTVERSION As Long = &H10

Public Sub DownloadFiles()
    Dim http As New XMLHTTP60, html As New HTMLDocument, downloads As Collection
    With http
        .Open "GET", "https://www.england.nhs.uk/statistics/statistical-work-areas/delayed-transfers-of-care/statistical-work-areas-delayed-transfers-of-care-delayed-transfers-of-care-data-2018-19/", False
        .send
        html.body.innerHTML = .responseText
    End With

    Dim aNodeList As Object, i As Long
    Set downloads = New Collection
    Set aNodeList = html.querySelectorAll("#main-content a[href*=xls]")
    For i = 0 To aNodeList.Length - 1
        downloads.Add aNodeList.item(i).getAttribute("href")
    Next i

    For i = 1 To downloads.Count
        If InStr(downloads(i), Format(DateAdd("m", -2, Date), "mmmm-yyyy")) > 0 Then
            Debug.Print downloads(i)
            downloadFile downloads(i)
        End If
    Next i
End Sub

Public Sub downloadFile(ByVal url As String)
    Dim ret As Long, arr() As String, outputPath As String
    arr = Split(url, Chr$(47))
    outputPath = "C:\Users\HarrisQ\Desktop\" & arr(UBound(arr))
    ret = URLDownloadToFile(0, url, outputPath, BINDF_GETNEWESTVERSION, 0)
End Sub

References:

Requires references for HTML Object Library and Microsoft XML.


API call:

Written for 64 bit

QHarr
  • 83,427
  • 12
  • 54
  • 101