0

I'm trying to save a URL as PDF with VBA. I'm able to get a file to download/be created, but when opening I get an error:

Acrobat could not open 'file.pdf' because it is either not a supported file type or because the file has been damaged...

I've tried a few different ways to download the file, but I suspect the reason is because my URL is not one like http://www.url.com/file.pdf, but instead it's like http://www.url.com/filegenerated/I where the page itself is a PDF, that I can look at and download. Unfortunately it's an internal URL, so I can't post a link to it, but here's an example of what I mean:

enter image description here

Any ideas on what I could do instead? Manually I just have to either Download the PDF (when hovering mouse over it, the scroll buttons and download button show, like in the above screen shot), or Right Click and Save As, and it will save as a PDF.


Here are two macros I've tried so far:

Function SaveWebFile(ByVal vWebfile As String, ByVal vLocalFile As String) As Boolean
    Dim oXMLHTTP As Object, i As Long, vFF As Long, oResp() As Byte
    Set oXMLHTTP = CreateObject("msxml2.xmlhttp")
    oXMLHTTP.Open "GET", vWebfile, False 'Open socket to get the website
    oXMLHTTP.Send 'send request
     'Wait for request to finish
    Do While oXMLHTTP.readyState <> 4
        DoEvents
    Loop
     'Returns the results as a byte array
    oResp = oXMLHTTP.ResponseBody
     'Create a local file and save result to it
    vFF = FreeFile
    If Dir(vLocalFile) <> "" Then Kill vLocalFile
    Open vLocalFile For Binary As #vFF
    Put #vFF, , oResp
    Close #vFF
    Set oXMLHTTP = Nothing
End Function

and

Private Sub DownloadFile(myURL As String, destFolder As String)

Dim WinHttpReq As Object, oStream As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", myURL, False
WinHttpReq.Send

myURL = WinHttpReq.ResponseBody
If WinHttpReq.Status = 200 Then
    Set oStream = CreateObject("ADODB.Stream")
    oStream.Open
    oStream.Type = 1
    oStream.Write WinHttpReq.ResponseBody
    oStream.SaveToFile destFolder & "file.pdf", 2 ' 1 = no overwrite, 2 = overwrite
    oStream.Close
End If

End Sub

And I've also tried Chip Pearson's example to no avail.

A 9kb file is created, but it won't open, per the error above. I don't think it's the file, because when I do save the PDF, it's about 50kb, not 9kb, so I'm thinking the file created is a "placeholder" PDF?

Community
  • 1
  • 1
BruceWayne
  • 22,923
  • 15
  • 65
  • 110
  • If you open the downloaded "broken" file in a text editor, what do you see? Does it contain readable text? – Tim Williams Apr 25 '17 at 16:36
  • @TimWilliams - I get a bunch of HTML. Ah! I notice the HTML looks to be our login page where it asks if you forgot your password. I do have to log in to see the PDFs, but I *am* logged in. Instead of using the macro, if I just manually open the URL, it opens fine. Am I required to somehow pass a UN/Password through in the macro to get the file to open/download properly? – BruceWayne Apr 25 '17 at 16:39
  • 1
    Logged via your browser doesn't equal logged in for your specific use case here. Getting to that point in VBA might be more or less complicated depending on exactly how the site manages security. – Tim Williams Apr 25 '17 at 16:41
  • @TimWilliams - Thanks for the comments! I will take a look and see how to add login stuff to the macro and report back. – BruceWayne Apr 25 '17 at 16:56
  • First of all open Developer Tools (press F12) in your browser, go to Network tab, open the page containing PDF and examine XHRs: what is actual URL, parameters, headers and cookies sent? Genrally you may get PDF in three steps. 1) Make XHR to download login page HTML conent, parse login webform and retrieve parameters. 2) Make XHR to send login webform parameters, keep cookies received in response header. 3) Make XHR adding cookies header to download binary content of PDF file, and save it to file. As you can see, you implemented just a half of the last step. – omegastripes Apr 25 '17 at 17:01
  • @omegastripes - Hmm, the URL I have, that's no issue. I don't know how to find the XHR info, nothing seems to load there. I saw some SO threads that showed more data (like [this one](https://stackoverflow.com/questions/15603561/how-can-i-debug-a-http-post-in-chrome) even though I'm on PC), but that info isn't there. I see the cookie info, but not sure what to do with this. Maybe I could instead just load all the URLs into Chrome, then using `SendKeys` (*shudder*) to go and print/Save As each page? – BruceWayne Apr 25 '17 at 17:25
  • SendKeys automation is inelegant and unreliable. For this specific case I may take a look via TeamViewer for further help on request parameters. – omegastripes Apr 25 '17 at 19:05

1 Answers1

0

You are downloading a webpage, which is in HTML format. Saving t with a .pdf extension does not magically turn it into a pdf document. PDF (Portable Document Format) is a format in itself, distinct from the HTML format.

Saving the webpage as a pdf file is a function of the browser. Likewise, saving a Word document as PDF is a function of Word.

So you can get the web page into IE for example and then tell IE to save it in PDF format.

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
  • So would I need to look at how to get Chrome (or IE) to save as PDF? That's where I'm stuck. My Google searches just keep finding ways to save a .pdf as .pdf, not an HTML formatted page as PDF. (There must be some key word I'm missing) – BruceWayne Apr 25 '17 at 16:19
  • I Googled "IE Object VBA" and it gave many hits. You can find how to manage an instance of IE through VBA – Paul Ogilvie Apr 25 '17 at 16:22
  • 1
    That ("you are downloading a webpage") is not the case - the content being served (at least when logged in) is a pdf, not HTML. – Tim Williams Apr 25 '17 at 16:42