3

I am looking to automate the downloading of multiple PDFs from our corporate website. This site only works over our internal corporate network/VPN and requires authentication (and is also https only).

I've looked into logging in via VBA/python but have had troubles. I imagine due to some combination of our corporate network set up and restrictions on accessing the site.

I think the easiest thing would be to just use an existing browser session to download the files, rather than worry about all the authentication and network issues?

I adapted VBA code I found online to identify and set a variable to an existing, authenticated IE window and navigate to a PDF on our corporate website (see below).

From there, how can I automatically save the PDF page from the existing browser session? The couple ways I saw online for saving files in IE dont seem to work. If this is easier through python I am also open to that. Thanks!

marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
    On Error Resume Next    ' sometimes more web pages are counted than are open
    my_url = objShell.Windows(x).Document.Location
    my_title = objShell.Windows(x).Document.Title

    If my_title Like "XYZ" & "*" Then 'compare to find if the desired web page is already open
        Set ie = objShell.Windows(x)
        marker = 1
        Exit For
    Else
    End If
Next

If marker = 0 then
    msgbox("A matching webpage was NOT found")
Else
    msgbox("A matching webpage was found")
    ie.navigate("https://corpwebsite.com/abcdef.pdf")
End If
Rob K
  • 63
  • 5
  • Does it require a password at any point or is that just handled by windows authentication? – jamheadart Jul 08 '18 at 18:28
  • @jamheadart yes the site requires a separate log in but it is same credentials as windows (using okta for single sign on i guess) – Rob K Jul 09 '18 at 00:41
  • I'd be tempted to try a POST request using a ServerHTTP connection to see if it gets pasts that login page and then it'd be dead easy to use another post/get to download the file. And if it didn't work in VBA then I'd use VB.Net so you can "visit" a page with the con using windows login credentials (there's a parameter to set). I did once use an IE instance to download some things and I remember some nightmares with compatibility and popups and blegh. – jamheadart Jul 09 '18 at 05:20
  • @jamheadart thanks. not sure this will work with vba now. I would need to RSA encrypt a password and send that to the server and receive back a cookie/token. Would have been nice to have something I could share with non-tech savy people but might just stick to python here – Rob K Jul 10 '18 at 17:11

1 Answers1

3

Try URLMon to download direct from URL? Assume you have handled any authentication issues.

Option Explicit

#If VBA7 And Win64 Then
    Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
    Alias "URLDownloadToFileA" ( _
    ByVal pCaller As LongPtr, _
    ByVal szURL As String, _
    ByVal szFileName As String, _
    ByVal dwReserved As LongPtr, _
    ByVal lpfnCB As LongPtr _
    ) As Long

#Else
    Private Declare Function URLDownloadToFile Lib "urlmon" _
                             Alias "URLDownloadToFileA" ( _
                             ByVal pCaller As Long, _
                             ByVal szURL As String, _
                             ByVal szFileName As String, _
                             ByVal dwReserved As Long, _
                             ByVal lpfnCB As Long _
                             ) As Long

#End If

Public Const BINDF_GETNEWESTVERSION As Long = &H10
Public Const folderName As String = "C:\Users\User\Desktop\abcdef.pdf" '<=Change as required

Public Sub downloadPDF()
    'Authentication code first. Maybe in a different sub.
    Dim ret As Long
    ret = URLDownloadToFile(0, "https://corpwebsite.com/abcdef.pdf", folderName, BINDF_GETNEWESTVERSION, 0)
End Sub
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • Thanks for the answer. As mentioned in my OP, authentication within the code was what I was having issues with. It has been difficult to navigate both the corporate network and the specific site requirements. That is why I was seeing if there was a way to use an existing, authenticated browser session. – Rob K Jul 09 '18 at 01:06
  • Have you tried the above with an existing authenticated browser instance? I have no test case to run I'm afraid. – QHarr Jul 09 '18 at 04:48
  • how would I set URLDownloadToFile function to use an existing instance? Where would I reference the IE variable I have set? – Rob K Jul 09 '18 at 14:46
  • Just try calling the download function. It it fails we can re-think. – QHarr Jul 09 '18 at 15:26